What is an angiosperm? Part 1: The difference between cladistic and phylogenetic classification

The “age of angiosperms” is still a matter of debate. But little discussion revolves around a more fundamental question. What is an angiosperm? The answer is trivial, from a modern-day perspective. A flowering plant. But when it comes to dated trees and phylogenetics, we clash with semantics and non-congruent philosophical frameworks. Because then it's not necessarily about producing a flower, but pure concepts.
In the last post, I provided some critical comments of a ‘review’ paper published by foremost palaeobotanists (Herendeen et al. 2017) in Nature Plants (a closed-access online journal by Springer Nature with obviously relaxed peer review), which triggered a fiery ‘reply’ by Wang (2017) published in a pseudo-scientific, non-reviewed open-access journal (Natural Science by SCIRP) that he edits. Herendeen et al. preferred a cladistic node-based definition of the flowering plants: any descendant of the ‘most recent common ancestor’ (MRCA) of all living angiosperms is an angiosperm (they later refine it to a poly-phyletic definition by excluding all descendants of this MRCA that cannot be assigned to any of the modern lineages). Wang, on the other hand, seemed to follow a cladistic branch-based definition: all members of the clade including the modern-day angiosperms are angiosperms. The strange thing here is that all authors are palaeobotanists with little idea or love for molecular-dating approaches, but still use the conceptual framework of molecular dating and phylogenetics. The alternative would have been to use a phylogenetic classification following Hennig's concept of monophyla, groups of inclusive common origin, linked to synapomorphies: uniquely shared, derived traits.

Phylogenetic and cladistic classifications

It is important to realise that one should distinguish between Hennig and cladistics (see also Felsenstein 2001, 2004; and this post on clades, cladograms, cladistics and networks). Hence, a short recapitulation and explanation.

Phylogenetic classification was formalised by Hennig (1950). Nearly a century earlier Haeckel (1866) coined the term monophyletisch (monophyletic) for a group of organisms sharing a common origin in opposite to polyphyletisch (polyphyletic): un-natural, artificial groups not sharing a common origin (until into the 20^th century, many classifications were simply based on form and ignored the evolutionary idea that one organism evolves from another; like not a few modern-day phylogenetic studies). In principle, phylogenetic classification is a natural consequence of accepting evolution, the idea that goes back to Darwin and Wallace. We only want to classify groups that are part of the same evolutionary lineage.
Whereas Haeckel and the first phylogenetic classifications (e.g. Pojárkova 1933 for Acer, maples; Schwarz 1936 for Quercus, oaks) where quite flexible (and bit nebulous) on how to define and recognise a common origin, Hennig put up a rule set and added a new category: paraphyletisch (paraphyletic).
Hennig (1950, 1982; — & Schlee 1978) differentiated between

monophyla in a strict sense, groups of an inclusive common origin – all descendants of the common ancestor – ideally defined by uniquely shared, derived traits (synapomorphomies); the only valid basis for defining a taxon;
paraphyla, groups of an exclusive common origin – some but not all descendants of the common ancestor – that may share primitive traits retained from their common ancestor, so-called ‘symplesiomorphies’;
polyphyla, artificial groups with multiple origins.

To avoid confusion, a paraphylum is monophyletic per earlier definition, Ashlock (1971) proposed to call Hennig's monophyla ‘holophyletic’ (which semantically makes sense given the meaning of mono- and holo-, but has not been widely adopted). When working with fossils and aiming for holistic classifications (e.g. Bomfleur et al. 2017), it can be very handy to have monophyletic as collective term for paraphyletic and holophyletic as it may be impossible to discern between the latter two based on the available data. A fossil group may be a precursor, hence, paraphyletic, or an extinct sister lineage, hence, reciprocally holophyletic.

From phylogeny (here: the true evolutionary tree) to Hennig's phylogenetic classification.
Each lineage (here = clade) has at least one synapomorphy, and the fossils (lower-case letters) equal the MRCAs of the modern taxa (upper-case latters). Taxonomic classes (rank) named after their first representative.

Theoretically perfect – we can only test for holophyly but not paraphyly – Hennig's phylogenetic classification has a fundamental short-coming when it comes to application: the number of ranks can easily become very large or the systematic groups too large to make any sense. And it doesn't work well with Linné's binominal code to name organisms, in particular when fossils should have binominals, too.

Two more simple examples.

In the examples above, it would be practical to give the paraphyla a name: the X-aceae share a common origin and can be unambiguously diagnosed with respect to the holophyla in both examples, but they are not inclusive.

Let's assume all terminals (modern-day taxa: B, C, E, F, G) in the above examples are genera. Using the Hennigian rule set, all fossils being the actual ancestors of two or more genera, would need to remain nameless or we would need to redefine the modern genera. In praxis, we would assign each fossil its own genus, as they all differ from the modern-day genera by missing a synapomorphy or the genus' autapomorphy. But since the fossil genera are ancestors of modern genera, they are paraphyletic per definition. Under certain conditions, we would even be tempted to use the same genus name for a fossil and modern taxon (because they are visibly very similar) with the consequence that a today holophyletic genus (or higher taxon) becomes paraphyletic. [Cladists have occasionally argued that all fossil taxa must represent extinct sister taxa to bypass this problem. But come on: how likely is it that we never find the ancestors of today (and in the past) dominating organisms, and only their less successful, hence, extinct, sister lineages?]

Inevitability of paraphyla when aiming at fully inclusive classifications.
No matter whether we start wit a molecular tree and modern-day relationships, a top-down classification, or including all available evidence from the historical record (when, what, where), a bottom-up classification, we have to deal with paraphyletic (morpho)taxa. The modern-day molecular data will make us realise that the X-aceae paraphyletic, collecting all X-ales that are not part of the holophyletic A-/B-aceae and F-aceae/-oideae. The fossil record indicates that the morphospace characterising E-oideae and E-aceae, including old (d) and oldest X-ales (x), equals a paraphyletic group.

Noting the impracticability of Hennig's system {Mayr Bock 2001} argued, in a somewhat convolute point-of-view, that we should accept and name (diagnose) also paraphyla in what they labelled an “evolutionary classification” (an ill-advised move) to distinguish it from Hennig's “phylogenetic classification”. Both classifications are based on the evolution-derived (phylogenetic) principle of assuming a common origin. Furthermore, we can only define paraphyla worth naming via the recognition of holophyla (see example above), so it is mainly a semantic modification of Hennig's concept. I would rather call it a Haeckelian phylogenetic classification as it names groups that Haeckel would have recognised as monophyletic.

For cladistic classification, we have two principal options once we inferred a tree (!): node-based or branch-based. At this point neontological practise diverges: dating papers always talk about the “angiosperm stem age” and the “angiosperm crown age”. The stem age is defined by point at which the angiosperm lineage diverged, its point of origin. The crown age is the age of the MRCA of all extant (living) angiosperms. So they implicitly define angiosperms branch-based. But in systematic botany (and palaeobotany, officially, but see Herendeen et al. 2017), we define the angiosperms essentially node-based assuming that the MRCA was also the first organism showing all characteristics of an angiosperm.

For our simple examples above, cladistic classification (node- or branch-based) will recognise the same taxa than (Hennigian) phylogenetic classification.

Cladistic classification systems for the three examples before.
Upper row: node-based; lower row: branch-based.

Since our data are free of homoplasy, all inferred clades represent holophyletic groups. Since all our fossils in the example equal the MRCA, there is no difference between node- and branch-based definition. The only price we pay is that our classification can be imprecise because some lineages lack the necessary number of diagnostic traits to resolve all holophyla. Total-evidence that uses molecular data to resolve modern-day relationships and a morphological partition for placing the fossils, is a double-edged sword. Keep in mind that missing data are treated as 'N' (= A or C or G or T) in a phylogenetic analysis. As consequence a total-evidence tree likes to place a fossil as sister to a non-inclusive (paraphyletic), in a historical context, clade comprising only the modern taxa, its descendants. A node-based modern-day-derived classification will inevitably include paraphyletic groups.

Total-evidence approaches will not work for node-based cladistic classifications.

In the case of real-world data, cladistic classifications that rely on an inferred tree are generally problematic in palaeontology, because

fossils provide – for the most part – only information about morphology;
most morphological traits have been evolved in parallel or convergently, i.e. more than a single time in the same (monophyletic [sensu Haeckel]) or different lineages (polyphyletic);
hence, clades in trees are often with poor support, can have equally valid alternatives, or are misleading.

Illustrated in a series of posts I made for David Morrison's Genealogical World of Networks on seed plants, dinosaurs, mosasaurs, insects, or using simple theoretical examples.

And this is the optimal case. In the worst case, our fossils do not provide enough scorable characters to include them in any explicit phylogenetic analysis at all. No tree, no cladistic classification.

What is an angiosperms depends on the classification system

When we apply the four classification systems on the angiosperm problem, we end up with three possible but different definitions of angiosperms – two cladistic ones (node-based, branch-based) and one phylogenetic one.

Cladistic node-based: all descendants of the hypothetical MRCA of all living angiosperms as defined by the 'angiosperm crown node'
Cladistic branch-based: all members of the angiosperm clade
Phylogenetic: all plants that (originally) evolved the full set of angiosperm synapomorphies (we should not be overly picky here: because of the molecular phylogenies we know that some lost/modified those traits later on).

The three options to define angiosperms beyond present-day.

The terminal diamonds and triangles reflect the known fossil record (following Earle 2010, for gymnosperms; and Stevens 2001 [onwards] for angiosperms, since neither Herendeen et al. nor Wang did their job and produced any list of earliest records). The time-scaled tree is based on the chronogram produced by Magallon (Magallón et al. 2015); it is a "meta-calibrated" (nearly-full constrained) tree fully in line with Herendeen et al.'s philosophy of an Early Cretaceous angiosperm crown age. Note that lacking compensating constraints, the divergence ages in the outgroup/ sistergroup (gymnosperms) are much too young, and the topology is biased by long-branch attraction between Gnetidae and angiosperms. For the angiosperms, the Magallón et al.’s tree fits with Stevens' tree based on various literature. Regarding the angiosperm synapomorphies (ASA), I simply assumed a young age to fall-in-line with current mainstream (will be discussed in Part 2).

The only difference between Hennigian and Haeckelian phylogenetic classification (“evolutionary classification” of Mayr & Bock) is the rank of the taxon collecting the angiosperm precursors (and potential extinct sister lineages of angiosperms). For Hennigian it needs to be the next-higher rank, i.e. Magnoliopsida, since only holophyla should be named; for Haeckelian, we can just define a paraphyletic class (same rank as angiosperms, the Magnoliidae; Stevens 2001 onwards) that include(s) all plants that share the same origin than the angiosperms and are part of the same evolutionary lineage, the Magnoliopsida, but have not (yet) evolved the full set of angiosperm synapomorphies. Once the last diverging sister lineage of angiosperms is defined (see last post), this can be accommodated in Haeckelian classification by additional paraphyletic taxa and in Hennigian classification by adding further ranks.

The two flavours of phylogenetic classification

Now that the principles and basic options are laid out, we can discuss them (see Part 2 of this post) and find that only a Haeckelian phylogenetic classification and definition of angiosperms is an option (not surprising, I know). At this point branch-based cladistic classifications will be highly misleading, node-based cladistic impossible due to the FUZ – Farris’ Uncertainty Zone – and Hennigian phylogenetic impractical because of the undetermined extend of the HAZ – Hennig’s Ambiguity Zone.

Cited literature
Ashlock PD. 1971. Monophyly and associated terms. Systematic Zoology 20:63–69.
Bomfleur B, Grimm GW, McLoughlin S. 2017. The fossil Osmundales (Royal Ferns)—a phylogenetic network analysis, revised taxonomy, and evolutionary classification of anatomically preserved trunks and rhizomes. PeerJ 5:e3433. https://peerj.com/articles/3433/
Coiro M, Chomicki G, Doyle JA. 2017. Experimental signal dissection and method sensitivity analyses reaffirm the potential of fossils and morphology in the resolution of seed plant phylogeny. bioRxiv DOI:10.1101/134262 http://biorxiv.org/content/early/2017/06/07/134262
Doyle JA, Endress PK. 2010. Integrating Early Cretaceous fossils into the phylogeny of living angiosperms: Magnoliidae and eudicots. Journal of Systematics and Evolution 48:1–35.
Doyle JA, Endress PK. 2014. Integrating Early Cretaceous fossils into the phylogeny of living angiosperms: ANITA lines and relatives of Chloranthaceae. International Journal of Plant Sciences 175:555–600.
Earle CJ. 2010. The Gymnosperm Database. Available at http://www.conifers.org/
Felsenstein J. 2001. The troubled growth of statistical phylogenetics. Systematic Biology 50:465–467. https://doi.org/10.1080/10635150119297
Felsenstein J. 2004. Inferring phylogenies. Sunderland, MA, U.S.A.: Sinauer Associates Inc.
Grimm G. 2017a. Morphology-based neighbour-net of seed plants: quick exploratory data analysis of the matrix of Rothwell & Stockey (2016). figshare. https://doi.org/10.6084/m9.figshare.5143732.v1
Grimm GW. 2017b. Should we infer trees on treeunlikely matrices? In: Morrison DA, editor. The Genealogical World of Phylogenetic Networks. http://phylonetworks.blogspot.fr/2017/07/should-we-try-to-infer-trees-on.html
Haeckel E. 1866. Generelle Morphologie der Organismen. Berlin: Georg Reiner. https://books.google.fr/books?id=dthOAAAAMAAJ&hl=de&pg=PR2#v=onepage&q&f=false
Hennig W. 1950. Grundzüge einer Theorie der phylogenetischen Systematik. Berlin: Dt. Zentralverlag.
Hennig W. 1982. Phylogenetische Systematik. Berlin, Hamburg: Verlag Paul Parey.
Hennig W, Schlee D. 1978. Abriß der phylogenetischen Systematik. Stuttgarter Beiträge zur Naturkunde, Ser A 319:1–11.
Herendeen PS, Friis EM, Pedersen KR, Crane PR. 2017. Palaeobotanical redux: revisiting the age of the angiosperms. Nature Plants 3, article no. 17015. dx.doi.org/10.1038/nplants.2017.15
Magallón S, Gómez-Acevedo S, Sánchez-Reyes LL, Hernández-Hernández T. 2015. A metacalibrated time-tree documents the early rise of ﬂoweringplant phylogenetic diversity. New Phytologist 207:437–453.
Mathews S. 2009. Phylogenetic relationships among seed plants: Persistent questions and the limits of molecular data. American Journal of Botany 96:228–236.
Mayr E, Bock WJ. 2002. Classifications and other ordering systems. Journal of Zoological Systematics and Evolutionary Research 40:169-194.
Pojárkova AI. 1933. Botanico-geographical survey of the maples of the USSR in connection with the history of the whole genus Acer L. Acta Inst Bot Acad Sci USSR, Ser 1 1:225-374.
Rothwell GW, Stockey RA. 2016. Phylogenetic diversification of Early Cretaceous seed plants: The compound seed cone of Doylea tetrahedrasperma. American Journal of Botany 103:923–937.
Schwarz O. 1936. Entwurf zu einem natürlichen System der Cupuliferen und der Gattung Quercus L. Notizblatt des Botanischen Garten und Museum, Berlin-Dahlem Bd. 13 Nr. 116:1–22.
Stevens PF. 2001 onwards. Angiosperm Phylogeny Website. Version 8, June 2007 [and more or less continuously updated since]. Available at http://www.mobot.org/MOBOT/research/APweb/
Wang X. 2017. A biased, misleading review on early angiosperms. Natural Science 9:399–405. https://doi.org/10.4236/ns.2017.912037