Cladistics vs Phylogenetics: What's the difference?

While working for a 2-piece post on the Genealogical World of Networks (A bit of heresy: networks for matrices used in Cladistics studies), I stepped over a threat on ResearchGate, where someone asked this. I browsed through the answers, and felt obliged to answer as well.

[The following is a fairly literal copy of my answer on RG, graphics are added]

Cladistics is about clades, defined as subtrees in rooted trees. There's a nice chapter in Joe Felsenstein's 2004 book, Inferring Phylogenies, on this; also pointing out why we actually should clearly distinguish between clades, a subtree in a rooted graph, and monophyla, an interpretative concept. It goes back to Farris (1983) and not Hennig (1950).

An inferred tree: all but one subtree can be diagnosed by form features. Members of genus Oval are part of the all-rounded subtree, but the olive Oval is sister to a subtree comprising Donut and the purple Oval.

By rooting the tree using the outgroup, the subtrees become clades and we can put names to the clades. Either branch-based (bold internodes) or node-based (dots representing the hypothetical 'most-recent common ancestor' — MRCA). Only under the assumption that the inferred tree reflects the true evolutionary tree, such a cladistic classification is a phylogenetic classification.

Hennig "just" provided a new (indeed better, because it can be tested) concept for monophyly in the framework of his "Kladistik" (which differs in quite some bits from what later became "Cladistics")

The problem Hennig tried (half succeeded, half failed) to solve: evolution of two reciprocally ("mutually") monophyletic lineages, Roundish (all descendants of Rounded) and Pointish (all descendants of Pointed), in time and morphospace. Note each cladogenesis, i.e. dichotomous split, is accompanied by a unique change in form or colour. But whereas forms only evolved once, i.e. are or were(!) synapomorphies, colours were also evolved in parallel (lush blue octagon) or independently in the Roundish and Pointish lineages (olives). 

Realising that form is more important than colour, we can put up an intuitive phylogenetic classification: all groups go back to a defined common ancestor, i.e. are monophyletic (in a pre-Hennigian sense). Hennig noted the difference between groups of inclusive common origin, his monophyla (green; Ashlock, 1971, proposed the term 'holophyla' to avoid confusion), and those of exclusive common origin, which he termed paraphyla (red). Since paraphyla are impossible to define without recognising monophyla, he suggested to avoid them at all cost. For example, Roundish is a monophyletic group defined by a smooth, rounded outline. They include one sub-monophylum: Donutia, which members are defined by a uniquely derived donut-shape, a synapormophy (found in all descendants of the first donut). But the other two shapes collect only part of the descendants of the first oval (Ovalia), also the ancestor of all Roundish, and first turned-over oval (Obovalia).

A proper Hennigian phylogenetic classification, recognising three monophyla instead of the paraphyla. It also illustrates what Hennig had to ignore and many cladists still do: change over time (also called evolution). The Pinkoids are all descendants of the first pink oval. Prior to the evolution of the monophyletic Donutia, pink was the synapomorphy of the Pinkoids. Today, due the evolution of darker shades, it is a symplesiomorphy — an ancestral, primitive, shared trait of some Pinkoids except for the Donutia. For their sister lineage, the Orangeoids, we have no synapomorphy at all — a dead end for Hennig. Nevertheless, we can characterise the monophyla viz clades (this is still a rooted tree) by their unique combination of traits: the Orangeoids are rounded, but not pinkish; the Turqoids greenish stars. This is where cladistics sets in: the tree topology is inferred from all traits, no matter whether they represent modern or ancient synapomorphies.

We had last year two posts on this (on the Genealogical World of Phylogenetic Networks)

PS Statements [see Victor Orrico's answer on RG] that NJ trees are "phenetic" are wrong (a common error): the NJ algorithm produces phylogenetic trees fulfilling either the minimum evolution (ME) or least-squares (LS) optimality criteria (depending how set up). The algorithms for UPGMA and NJ are both cluster-algorithms (so "phenetic", if you want), but for the NJ it has been shown that it succeeds in finding a good estimate for the ME or LS tree (which UPGMA does only by accident). NJ is just a shortcut to find a ME or LS-optimised phylogenetic tree from a distance matrix (again e.g. Felstenstein, 2004, Înferring Phylogenies). A perfect matrix, where each cladogenesis is represented by at least two subsequent synapomorphies will result in a perfect distance matrix, and the ME or LS tree inferred from this matrix, will be the true tree, and identical to the single MPT inferred from the character matrix. If convergences outcompete synapomorphies, the MPT will have clades that are not monophyletic, as will (to a lesser degree it seems) the ME or LS tree, whereas compatibility and probabilistic methods can handle this to some degree. 

Phylogenetics is about phylogeny, evolutionary pathways, and goes back to Darwin and Wallace's age. The first phylogenetic trees were published in the 19th century, one of the earliest at my Alma mater, the University of Tübingen, by Franz-Martin Hilgendorf (who also published possibly the first phylogenetic network). Haeckel did a lot to advocate phylogenetic trees, and also coined monophyly, if I remember correctly). Regarding first phylogenetic trees including a definition of what a phylogenetic tree is, see this post by David Morrison
[Side-remark: A phylogenetic tree is a tree depicting ancestor-descendant relationships, which, ironically, no cladogram, the still commonly seen rooted trees without branch lengths, can; and phylograms, rooted trees with branch lengths, only indirectly by zero-length terminal branches.]

Left, Hilgendorf's 1866 phylogenetic tree depicting ancestor-descendant relationships (monophyletic groups coloured); right, a cladogram depicting most of the monophyla, but no ancestor-descendant relationships.

I gave it a quick search, and found this nice set of lecture slides giving a quite comprehensive introduction into "evolutionary (phylogenetic) trees" and three of the methods to infer them: "Parsimony; Distance matrix based; Maximum likelihood" [link to PDF].

Cladistics is hence a (quite restricted) subset of phylogenetics (not synonymous with Hennig's "Kladistik").

So, to be on the safe side, always go for phylogenetics.

An optimal (dated) reconstruction for our example including only tip-taxa. For the modern-day taxa, the inferred tree equals the true tree (assuming perfectly clear, tree-like, molecular data). Fossil taxa placed based on morphology. Using this result, we can label the clades ...

... some of which fulfil Hennig's monophyly (green), others are (inevitably) paraphyletic (orange). Or even diphyletic (red): because of its colour, which is only found in two of the taxa, the extinct Fivestar is placed as sister to the extant Fourstar, although it represents an extinct side lineage of all modern Staroids. To escape this branching error, we would need to feed the analysis (constrain it) with the (phylogenetic) information (informed assumption) that 5-star-morphologies and turquoise colour are primitive ("plesiomorphic") within the Staroids, and predate the divergence of the modern lineages. Only by going back to Hennig's philosophical framework, we may decide which clade to keep (the likely monophyletic ones) and which to drop (the probably not monophyletic ones) to evolve a cladistic classification into a phylogenetic (here: Hennigian) one.

And largely irrelevant these days. Not a few are aware (openly or shyly) that clades in rooted trees often correspond to monophyla, i.e. groups of inclusive common origin, but not necessarily do so. Incomplete lineage sorting is cladistics' greatest foe. Just take the many cases where different genomes tell different stories: the nuclear, mitochondrial and/or plastid trees may have different highly supported clades, but there can only be one monophylum (or two overlapping ones, in case of hybridisation). Which we try to infer based e.g. on the coalescent tree (which is a special form of coalescent network).

Or, think of a misplaced root or ingroup-outgroup long-branch attraction that easily turn a grade into a clade an vice versa. Especially parsimony trees can be severely misleading (see eg. this recent paper by Scotland RW, Steel M. 2015. Circumstances in which parsimony but not compatibility will be provably misleading. Systematic Biology 64:492–504).

Ingroup-outgroup long-branch attraction. The outgroup flips around the ingroup tree, the splits remain the same, but all monophyla (green boxes) become grades (more in Clades, cladograms, ... on GWoN)

Plus, there are many evolutionary/biological processes that inflict reticulation, i.e. ancestor-descendant relationships that cannot be modelled by a tree at all. A phylogenetic tree is just a special phylogenetic network, i.e. a phylogenetic network without reticulation.

A notable exception is classification. Cladistic classification, putting names to clades in inferred trees (under the implicit assumption that all clades represent monophyla fide Hennig), is still the holy goal.

Although, we often bend the rules and use (more general) phylogenetic classification concepts. Oaks being an example: the first multigene trees placed them in two separated, well-supported clades, but no-one was bold enough to divide this (most likely monophyletic) genus into two genera fitting the two clades in the trees or include the chestnuts etc. in the oaks. We formalised the two oak clades last year as subgenera (paywalled final version; free Pre-Print with one major change: Ponticae and Virentes accepted as additional sections in final version), the new infrageneric classification of oaks is hence a cladistic one based on nuclear oligo-gene and phylogenomic trees. But we are confident that it is also a phylogenetic one: our subgenera and sections are not only clades, but also monophyla (today and back into the past).

Cladistic or Hennig-phylogenetic classification (e.g. PhyloCode, using 'clade' as synonym for 'monophylum') is, however, impractical (to impossible, see e.g. Brummit 2002, How to chop up a tree) when being extended to fossils, we summarised the different concepts (those used in reality) in Fig. 8 of our 2017 Osmundales paper (open access). Naming (likely) paraphyla, or groups that may be para- or monophyletic, is inevitable. Ancestral forms and groups need names, too (no mention of fossil/ancestral taxa in the PhyloCode).

Why is there so much confusion?

Apparently many still hang on to parts of the 80s intellectual cladist package as summarised by Joe Felsenstein this list can be found in his 2001 piece for Systematic Biology, open access). So you not rarely get odd (and wrong) comments from (anonymous) reviewers (I got them quite often, since I usually used networks for phylogenetics and frequently had to deal with fundamentally "a-cladistic" data)

Quoted from Felsenstein (Syst. Biol., 2001, p. 466):
"The cladists of that era had accepted a number of points as an intellectual package. At one point in the mid-1980s I tried to summarize the package and came up with these points, in order of importance
  • Use Hennig’s terminology—autapomorphy, symplesiomorphy, and so forth—rather than terms like ancestral or derived. [still very common] 
  • Classify cladistically; use only monophyletic groups. [still the official standard] 
  • Do biogeography by vicariance (pace Hennig). Use only computer programs written by leaders in the Hennig Society, all others are fundamentally flawed. [rarely openly stated, but I experienced this during review still in the zeroes] 
  • Use only parsimony methods. Compatibility methods are evil. Do not weight characters. [has become rarer, but often still frown about, even by those who then use TNT's post-inference character weighting option to increase branch-support] 
  • Be hostile to molecular data. Consider your methods to be hypothetico-deductive. Fossils are to be treated the same as living species. [this is still standard, and beyond cladistics] 
  • Parasites always have exactly the same phylogenies as their hosts. It is important to go around saying that one cannot infer ancestor–descendant relationships. [this is a wide-spread belief, partly out of necessity: tree-inference programmes do not allow placing ancestors on the nodes or internal branches, all OTUs have to be tip taxa] 
  • It is important to go around saying that species are individuals, not classes. [many still think species are the only "natural" biological unit, fundamentally different from e.g. genera; which everyone knows to be nonsense, who worked with data from more than one individual per species] 
  • Be sceptical of the reality of the species as nonoperational. [see above] 
  • History: William of Ockham told Popper to tell Hennig to use parsimony." [still a belief, especially in palaeontology]

No comments:

Post a Comment

Enter your comment ...