Res.I.P. – an unprofessional science (and other things) blog: When dating is futile

With most-versatile programmes like BEAST at hand, everyone can do a molecular dating. Which makes it even more important that editors and peers check the priors and reasoning behind it. And always should make sure a palaeobotanist (or at least, a palaeogeographer) is one of the peers.

In German, we call it a Binsenweisheit. Any node dating can only be as good as its age priors.

Consequently, any paper publishing a chronogram should have been looked through by a palaeontologist or, in general, somebody with some insight of the fossil record the studied group and lineage.

Reality is that, at least in plant biogeographic and molecular dating studies, few seem to bother involving palaeobotanists and alike in doing or reviewing the molecular dating. In contrast, neo-nigh-experts (NNEs) are so universally competent that they may review your palaeo/ molecular paper, but you would be never be asked to review their papers. Even if they take your fossils as (not so fitting) age constraints. As consequence, the NNE's co-authors end up constraining the age of a clade reconstructed to be purely Austral(as)ian with your Eocene pollen from the other side of the world (invoking ancient extremely long distance dispersal, or did it just fall through a crack and tunnelled the hollow earth?!) Just a couple of years after the NNE told you he doesn't believe that your systematic assessment is correct, anyway, and pollen are useless in general (without having ever bothered to look at one, obviously).

Admittedly, the opposite is also true (Age of angiosperms). Aged (not well-aged like good red wine) palaeo-nigh-experts (PNEs) tend to discard any molecular dating as “methodological flim-flam”. Also, PNE's still get into high-fly journals (a recent example published in Science) with spectacular hypotheses based on very little data and ignoring not only all molecular data and phylogenies in the last 20 years but also a good deal of the fossil record.

There's a large grey zone between being famous and infamous in the soft natural sciences such as botany, palaeobotany and phylogenetics.

Clocks and rocks

In the clock-and-rock controversy, you often hear "molecular divergence estimates are too young". This is technically an air-punch, because a node dating can only give minimum estimates: when we do node dating, we expect that the real divergence happened before. The fossil is assumed to be younger than the node it constrains.

For the above graph, we assume the actual genetic branch-lengths in the molecular tree reflect 1:1 the actual time passed, i.e. a strict molecular clock. In reality, they don't, and we leave it to the algorithm to judge how much happened within a million years in the different parts of the tree.

Having no other fossil to correct our estimates, our root (stem) age estimate will be some 10 Ma too young, but the estimate for the MRCA of the sister clade of A, FGH will be quite ok (few Ma too young).

Tip: When you report results from node dating, avoid writing "diverged in" and instead use "were diverged by", "latest in", "by/in the x-cene, y lineages were present". Unless you are sure the fossils you used as age constraints are temporally very close to the MRCA you constrained.

Now imagine, we only would have b as a fossil of the lineage leading to A, our estimates would be much too young because we constrain a node that is actually 20 Ma old to be just > 5 Ma.

Too young fossils lead to much too young estimates. And no model can save you.

Estimating, e.g., a minimum crown age for a genus of 15 Ma (mid-Miocene, a common estimate in dating literature) when all main intra-generic lineages have a fossil record going back at least 45–35 Ma (late Eocene, early Oligocene) may not be technically wrong (min. 15 Ma < 45–35 Ma), but still is a very poor estimate. Especially, since, when we do dating, we usually take our estimates as-is, and not as "being a minimum estimate, it maybe also have happened 20–30 myrs older".

The reason that many molecular clocks give (much) too young estimates is not because molecular dating is generally flawed but that the daters (had to) use(d) poor data, misleading topologies, and spiced the lot with too young or even wrongly placed fossils (sometimes because the original literature misplaced the fossil lacking a systematic-phylogenetic molecular framework or got the age wrong).

There's are also practical problems.

It's bloody hard to find palaeobotanists that have the necessary overview who are still alive, willing to co-operate and can afford (time-wise) to engage in the process. Palaeobotanists usually don't have scores of master and Ph.D. students working for them. When they join a molecular dating study, they will have plenty of work with it (usually more than whoever is doing the dating analysis), and their other work will have to wait.
Short, blind dip into palaeobotanical literature has a high probability to pick the wrong fossil. And when taking estimates of others to start from, you just propagate inaccuracy adding to imprecision.
Dive deep into the literature, and you can get lost in the abyss. A big amount of palaeobotanical literature is hazardously outdated using long-rejected systematic-phylogenetic concepts.

On the other hand, editors and the usual peers know as little as you about the fossil record, so why bother getting proper age priors to constrain your dendrogram? Publishing another (meaningless) Miocene crown age will face no opposition during review.

Many oak chronograms published over the years, fit the pattern, including the most recent chronogram in Li et al. (2019), published in the peer-reviewed journal Frontiers in Plant Science and reviewed by three peers (Frontiers practises the no-blind review). None of which can be too familiar with the oak data used (Why you never should do a single-species plastid analysis of oaks), not too mention the fossil record of oaks or the general pitfalls of node dating. While the paper is generally of quite good quality, their dating has been for the bin.

No need to dig for the fossil record of oaks

It has not been completely revised (and cannot replace recruiting a palaeobotanist) but in oaks, it'd not that difficult to get a set of by-the-book age priors. My Vienna hosts (thanks again to the FWF for financing my Ehrenrunde) published a few (all open access) papers explicitly dealing with and/or summarising the oak and Fagaceae fossil record: Grímsson et al., Plant Syst. Evol. 2015, Grana, 2016, and Acta Palaeobot., 2016. Last, but not least, there's a short paragraph for every section (= monophyletic evolutionary lineage in the case of oaks) in Denk et al. (2017), a book chapter available also as pre-print. Just because it's low impact and palaeobotany, doesn't mean it's useless and can be ignored.

Li et al.'s (otherwise nice, so give it a read) study deals with one of the three East Asian species of sect. Cerris, one of the three main lineages within subgenus Cerris, the Eurasian oaks. The oldest fossil of sect. Cerris is, quite fitting, from the Oligocene of north-eastern Asia (mentioned but not used by Li et al.). Sect. Cerris is monophyletic (in a general-Hennigian not tree-inference-based Farrisian sense; see e.g. Morrison 2017 for the difference), its sister lineage is sect. Ilex. The East Asian fossil maybe a crown- or stem-Cerris. Given its age and the low differentation of modern-day Cerris species across Eurasia, it is probably stem. Thus, informs a minimum age for the most-recent common ancestor of sect. Cerris and sect. Ilex.

Pretty fitting sequence: oldest fossils of subgenus Cerris, the Eurasian ('Old World') oak clade. Full lines – conservative/by-the-book uses as age priors; dotted lines – more experimental alternatives. Stippled lines – the oldest castanoid fossil has been variably been used to constrain the stem age of the Castanea-Castanopsis clade, or deeper nodes down to the MRCA of Fagaceae and all Fagales except Nothofagaceae.

Straightforward. If you use nuclear data.

According to all plastid trees so far, the (plastid) sister lineages of sect. Cerris are only several East Asian Ilex oaks. By the book, when using plastids the Cerris-fossil now can only inform the minimum age of the MRCA of Cerris and one or a few East Asian Ilex oaks.


Same fossils, different, because chloroplast-based, topology.

Obviously, our fossil set doesn't fit so well anymore when we use the plastid genealogy (in oaks, plastid signatures are largely decoupled from species processes, today and in the past). The Ilex plastids sistered with either Cerris or Cyclobalanopsis are highly similar (and for simplicity I left out the lost Ilex lineage), hence, using by-the-book constraints, we imply a near-zero fixation rate in those lineages. Age-distribution wise it'd a valid assumption that the south-bound Ilex oaks just captured the Cylcobalanopsis plastid (alternative b clearly outcompetes by-the-book alternative a). For the oldest Ilex fossil there are two equally valid alternatives, depending whether we regard is as evidence of starting crown-group radiation within subgenus Cerris, or deepest (but not first) representative of this lineage. And we now have three alternatives where to place the oldest Castaneoidea non-microfossil (castanoid pollen goes further back):

It represents a, back then, (also) North American, lineage of castanoids from which the modern Eurasian genera derive (alternative i).
It represents a radiation pre-dating the split of the (Eurasian) plastid gene pool, an extinct side lineage or stem-taxon of the Eurasian core Fagaceae (alternative ii)
It is the first representative of the core Fagaceae in general or extinct side lineage to the whole modern lot, hence, informs a stem Eocene stem age of the entire clade. (alternative iii, the most conservative and least-hypothetical estimate).

Not that trivial anymore, is it?

A poor secondary dating constraint and a wrong root

Li et al. note that their dating gives estimates (the crown age of their species is around, surprise, surprise, the common Miocene 15 Ma mark) substantially younger than the oldest fossil record of section Cerris (citing Denk et al. 2017). But instead of adding the data from the relevant Ilex oaks to their data set (e.g. from gene banks, complete plastome data are available for two of the putative Cerris sisters) and use the oldest fossil to inform the minimum age of the MRCA of Ilex-sisters and Cerris (by-the-book approach), they use what is called a secondary dating constraint. A (too young, naturally) estimate from another study.

Rule 1: If you have a fossil and the data to infer a tree, never use secondary dating constraints! Any reviewer should write that down and keep it in mind for the next time (s)he reviews a paper with a chronogram.

For their secondary constraint they followed a nuclear-based (!) analysis focussing on sect. Cyclobalanopsis by Deng et al. (2018) “... to constrain the crown age of Quercus (including section Cerris and section Quercus) to 35.89 Ma (normal prior, SD = 2 Ma)”.

Now, when we consider the known plastid genealogy, the MRCA of sect. Quercus (subgenus Quercus) and Cerris (subgenus Cerris) is,

indeed, the MRCA of all oaks,
but, also, the MRCA of oaks and other genera of the core Fagaceae.

Age priors used by Li et al., plotted on the plastid genealogy. Instead using older ingroup fossils, the authors opted for a too young secondary dating constraint

Clearly, 35.89 (± 2 Ma) is too young for this MRCA (and overly precise). Five oak section have older fossils, see also the dating tests in Hubert et al. (2014) and discussion of published Fagaceae age estimates in Grímsson et al. (2016, open access). A preliminary-rough but sensible dating using the actual oldest records of each lineage can be found in Hipp et al. (in press, pre-print online since end of March).

As already Li et al.'s other constraint, the “earliest unequivocal megafossil of subfamily Castaneoideae of Fagaceae from the Paleocene/ Eocene boundary (Crepet and Nixon,1989) [was] used to set the root age to 53.50 Ma (normal prior, SD = 3 Ma)”, i.e. a direct age prior, reveals. By the way, the paper states the fossil is from the Paleocene-Eocene boundary, which, according to contemporary ICS charts would be 56 Ma (in molecular dating, it is very common to just copy and paste, while ignoring the evolution of the ICS chart). But more importantly, Crepet & Nixon's age estimate was wrong: re-investigations have shown the Tennessee sediments are late early Eocene (≥49 Ma; Planchard et al. 2016).

Crepet & Nixon's fossil (using the wrong age) is a classic in dating literature, here used to constrain the root age of their tree, the split between chestnuts and oaks: an old, possibly the oldest, Castanoideae fossil that has been frequently re-used in Fagaceae/Fagales dating studies usually to inform an Eocene root (stem) age of the Castanea-Castanopsis clade.

Side note: Despite being uncritically used in many dating papers, using Crepet's and Nixon's fossil is not unproblematic, because the Castanoideae are paraphyletic to the oaks, a fact Kevin Nixon, PNE, oak-NNE, ignores till today (see the above mentioned Science paper) and nuclear and plastid data disagree heavily about the intergeneric relationships. The "cladistic analysis" provided by Crepet & Nixon is outdated and probably flawed (back then, and still today, most PNE's refused to publish their matrices). Its possible uses include constraining a minimum stem age (conservatively by the book) or crown age (experimentally, because the fossil may evidence starting crown-group radiation) of the core Fagaceae, which include the Eurasian Castanea-Castanopsis, the Malesian/S.E. Asian Lithocarpus, the North American relitcs Chrysolepis and Notholithocarpus, and the oaks. The same applies to the new, older fossil from Patagonia, Castanopsis rothwellii (see Ockham's Razor applied but not used...; Why we may want to map trait evolution on networks [Part 1] [Part 2]).

Which, on the background of the actual plastid evolution, may (see side note above) inform a node higher up (later, i.e. younger) than the one linked to the too young secondary dating constraint.

Li et al. avoided having to deal with all of this by

reducing the sample to a set excluding all topological issues (leaving out all available data on any other member of subgenus Cerris except for their focal species)
and rooting their tree under the assumption that the plastids follow the nuclear phylogeny, which they don't!

Li et al.'s dendrogram (their fig. 7), the basis for the discussion of intra-species differentiation in Q. chenii.

You always get a Miocene crown-age when using wrong assumptions and incomprehensive sampling (see also this 2019 comment by Qian to a Nature paper, naturally published in another journal).

Rule 2 (editors, reviewers, readers): When the dating reports a Miocene diversification, double-check the age priors. And the data matrix.

Using clashing constraints, older constraints for younger nodes or fixing a too deep or misleading node by omitting disturbing lineages/ roots, is nothing particularly unique to Li et al. but common practise in molecular dating. The same can be found e.g. in all all-Fagales datings (Sauquet et al. 2012; Xiang et al. 2014; Xing et al. 2014; Larson-Johnson 2016). Studies that came up with fancy analyses and results, were partly published in very prestigious journals, and all relied on highly problematic data (partly not curated, Xing et al., Xiang et al., and very old mediocre sequence data, Sauquet et al.; gene samples with substantial internal signal conflict, Sauquet et al., Xing et al.; or simply lacking any proper signal for the purpose of the study, Xiang et al., Larson-Johnson).

The impossibility to date intra-(or inter-)species differentiation using plastid data

Now, even with methodologically correct priors and a properly rooted tree, Li et al.'s dating approach could have not provided any sensible estimates for intra-species (or inter-species) differentiation because we have, just by using data from Q. chenii, no idea what is species-restricted and what is part of the larger pattern involving all three sibling species [Added 14/10—Yao Li just sent me a message on RG, they can fill this gap using a simply gorgeous data set including all three species. In professional science, we often are forced to publish bit-by-bit to keep the 'Rubel rolling', as we say in German.]

A full haplotype network for the most variable plastid marker of East Asian Cerris (Why you should never do a single-species plastid anaylsis of oaks).

“The dating analyses of major cpDNA clades suggested that Q. chenii may have experienced the earliest divergence during the early Miocene (node E, 16.70 Ma, 95% HPD: 10.10–23.99 Ma, Figure 7)” should actually read: "... that the shared East Asian Cerris plastid gene pool started to diverge....

The not-only-chenii-haplotypes may be older than the species itself when shared with Q. acutissima and Q. variabilis. Pre-speciation differentiation patterns and intra-specific evolutionary trajectories may surface when the sorting of plastid haplotypes differs between species, especially when occurring in the same area. The inferred "Q. chenii crown age" of 24–10 Ma has little to do with the species but points towards geographic break-down of the plastid LCA, the last common ancestor, of all three modern species and their precursors. Molecular clocks are fickly little things when evolution is not a series of dichotomous cladogenetic events (which it never is, the closer we get to the coal-face of evolution; and for the deep-down it only works because all the mess happened back then has sorted out till today).

Little problems when dating speciation events using data only inherited from one parent.

Bigger problems when dating speciation events. Left, unilateral (asymmetrical) introgression; right, population size effects, no reticulation involved, no incomplete lineage sorting.

There's no good option for dating but one can play around

The deep incongruence – a plastid genealogy clashing with the nuclear + morphological phylogenetic synopsis (leading to the updated oak classification in Denk et al. 2017) and a likely monophyletic sect. Ilex scattered and not forming a clade – makes it tricky to use any fossil as node constraint for a plastid genealogy in oaks. Using mutation rates from literature doesn't help either, oak plastomes evolve slowly. Hence, you find no dated plastid tree in our papers. It's not that we didn't looked into it. We just realised it is futile and only gives house-numbers with the data at hand.

But molecular dating still sells, and many, usually data-ignorant, reviewers (NNEs and others) ask for them when you submit a paper about evolutionary and/or biogeographic patterns. So, how can we do something that not just adds more Miocene house-numbers?

The only valid dating option would have been to fool and test around. First,

include data from Q. acutissima and variabilis in order to have placeholders for each major haplotype in the East Asian Cerris;
and from the known direct plastid sister lineages of sect. Cerris, most importantly Q. baronii (complete plastome available). It's unfortunate but there is no complete plastome data on any of the western Eurasian species but according to our 2016 tree (and the all-inclusive trnH-psbA 2018 haplotype network), they appear, plastid-wise to be more distant relatives than the East Asian Ilex with Cerris-similar plastid signatures and related to the western Eurasian/ Himalayan group of sect. Ilex (WAHEA-type in Simeone et al. 2016).
Add (at least) two representatives from the Cylcobalanopsis-Ilex plastid lineage.
Finally, root the tree with the two white oaks. We know latest since Manos et al. (2008) that both subgenera (back then called "New World" and "Old World clade") are reciprocally monophyletic but that the plastids of the Eurasian core Fagaceae (Quercus subgenus Cerris, Castanea-Castanopsis, Lithocarpus) are closer related to each other and different from those in their relatives predominately (Quercus subgenus Quercus) or exclusively found in the New World (Chyrsolepis, Notholithocarpus).

Then start testing ...

... use the oldest unequivocal sect. Cerrris fossil (which is from East Asia, conveniently) to inform the stem age of the plastid lineage including the members of sect. Cerris under the assumption that the first Cerris did have a Cerris plastome.
... use the oldest Eurasian oak (Ilex-type, primitive pollen)/ chestnut/ Castanopsis fossil to inform the age of the MRCA of the Eurasian core Fagaceae, a clade that includes sect. Cerris but not sect. Quercus and the rest of subgenus Quercus. Because we know plastids do not follow strictly phylogenetic lineages but are geographically controlled, it makes no sense to use a fossil from the New World (e.g. the recently published oldest "Castanopsis" from Patagonia) to constrain an Old World plastid clade; even if the American castaneoids are crown-group Castanopsis, they represent the American lineage, and thus, carried the New World plastid.
... (optional but optimal) hire a palaeobotanist (why not pay for the service) to compile and revisit the fossil record of Cerris in East Asia to get additional constraints for nodes in the tree to test (or be able to try a fossilised-birth-death dating, it's just better even if you have limited genetic data due to the lack of resources: e.g. Renner et al. 2016).

And as a final step: map the results on actual palaeoglobes (like Robert Scotese's, see Easter Egg 2019 post) rather than repeating out-dated commonplaces about Miocene-Himalaya-uplift triggering speciation (again see Qian 2019), or, in this case, intra-"specific" variation.

There be oaks. And mountains. Before the Miocene

The use of dating plastid trees of oaks is hard to assess (unless you can compare them to dated nuclear trees). Cerris plastid haplotypes are obviously not only very similar to each other but also to those of some members of sect. Ilex. One would have to report and critically discuss the inferred near-zero mutation rates. Furthermore, the estimates would be probably not be much less diffuse than the ones shown in Li et al.'s chronogram (note the extent of highest-probability density bars). Lastly, there may be quite a lot evolutionary scenarios to test, even when using only the oldest fossils of each genus and oak section.

Some more dating scenarios testing possible evolutionary hypotheses regarding what the oldest fossils represent and what plastid they carried.

Cited papers

Blanchard J, Wang H, Dilcher DL. 2016. Fruits, seeds and flowers from the Bovay and Bolden clay pits (early Eocene Tallahatta Formation, Claiborne Group), northern Mississippi, USA. Palaeontologia Electronica: 19.3.51A.
Crepet WL, Nixon KC. 1989. Earliest megafossil evidence of Fagaceae: phylogenetic and biogeographic implications. American Journal of Botany 76:842–855.
Deng M, Jiang XL, Hipp AL, Manos PS, Hahn M. 2018. Phylogeny and biogeography of East Asian evergreen oaks (Quercus section Cyclobalanopsis; Fagaceae): insights into the Cenozoic history of evergreen broad-leaved forests in subtropical Asia. Molecular Phylogenetics and Evolution 119:170–181.
Denk T, Grimm GW, Manos PS, Deng M, Hipp AL. 2017. An updated infrageneric classification of the oaks: review of previous taxonomic schemes and synthesis of evolutionary patterns. In: Gil-Pelegrín E, Peguero-Pina JJ, and Sancho-Knapik D, eds. Oaks Physiological Ecology. Cham: Springer, p. 13–38. Pre-print (open access) at bioRxiv [major change: Ponticae and Virentes accepted as additional sections in final version]
Grímsson F, Zetter R, Grimm GW, Krarup Pedersen G, Pedersen AK, Denk T. 2015. Fagaceae pollen from the early Cenozoic of West Greenland: revisiting Engler's and Chaney's Arcto-Tertiary hypotheses. Plant Systematics and Evolution 301:809–832 — open access.
Grímsson F, Grimm GW, Meller B, Bouchal JM, Zetter R. 2016. Combined LM and SEM study of the Middle Miocene (Sarmatian) palynoflora from the Lavanttal Basin, AustriaGrana 55:101–163 — open access.
Grímsson F, Grimm GW, Zetter R, Denk T. 2016. Cretaceous and Paleogene Fagaceae from North America and Greenland: evidence for a Late Cretaceous split between Fagus and the remaining Fagaceae. Acta Palaeobotanica 56:247–305 — open access.
Larson-Johnson K. 2016. Phylogenetic investigation of the complex evolutionary history of dispersal mode and diversification rates across living and fossil Fagales. New Phytologist 209:418–435.
Li Y, Zhang X, Fang Y. 2019. Landscape features and climatic forces shape the genetic structure and evolutionary history of an oak species (Quercus chenii) in East China. Frontiers in Plant Science doi:10.3389/fpls.2019.01060.
Qian H. 2019. Biases in assessing the evolutionary history of the angiosperm flora in China. Journal of Biogeography DOI:10.1111/jbi.13530.
Sauquet H, Ho SY, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, Bayly MJ, Bromham L, Brown GK, Carpenter RJ, Lee DM, Murphy DJ, Sniderman JM, Udovicic F. 2012. Testing the impact of calibration on molecular divergence times using a fossil-rich group: the case of Nothofagus (Fagales). Systematic Biology 61:289–313.
Xiang X-G, Wang W, Li R-Q, Lin L, Liu Y, Zhou Z-K, Li Z-Y, Chen Z-D. 2014. Large-scale phylogenetic analyses reveal fagalean diversification promoted by the interplay of diaspores and environments in the Paleogene. Perspectives in Plant Ecology, Evolution and Systematics 16:101–110.
Xing Y, Onstein RE, Carter RJ, Stadler T, Linder HP. 2014. Fossils and large molecular phylogeny show that the evolution of species richness, generic diversity, and turnover rates are disconnected. Evolution 68:2821–2832.