The very principle of node dating using fossils
It has been pointed out occassionally that many molecular clocks are too young. This critique is often true but overlooks what node dating does. The oldest (known) representative of a lineage (e.g. genus or family) is used to constrain the possible minimum stem (root) age of that lineage. Since we (typically) do not know how close this oldest (and recognisable) representative is to the actual first member of the lineage, i.e. the common ancestor (CA) of the lineage (the lineage's root), it can only inform a lower boundary (Fig. 1). Thus, node-dating-based estimates can be expected to be underestimating in most of the cases and should always be regarded and treated as minima. The closer the fossil(s) used as constraint is (are) from the actual lineage-CA (root nodes), the less underestimating will be the minimum age estimates.Using A’, the oldest known fossil of lineage A, as minimum age constraint for the MRCA of the modern species A, C, and D, one may infer a very young and underestimated divergence age for the sister species C and D or the genus’ stem age. But the estimates – e.g. C and D diverged before 2.5 Ma, the genus diverged latest at 20 Ma – are not wrong, the real divergence ages (10 Ma, 40 Ma) are greater than the estimates.
New palaeontological evidence may show that lineages A and CD were already present at nearly 20 Ma, and accordingly our node-dating minimum estimates will become only slightly underestimating. Since many dating studies rely on conservatively assigned (regarding taxonomy) deep node constraints (not rarely taken from earlier dating studies), it is not overly surprising that estimates towards the tips can be highly underestimating.
Fig. 2 Same tree, but older fossils that are closer to the nodes they can constrain, which automatically leads to less underestimating minimum divergence ages |
Sensible node dating requires direct input/control by palaeontologists
Fact is: only palaeontological experts (may) have the overview to decide which fossil is a good candidate for a node-dating constraint. Any (node) dating study stands and falls with the quality of the known fossil record. To get substantially better dating estimates, one primarily needs a better understood fossil record. Nothing else.Relatively young fossils (A’, B, S1 in Figs 1 and 2) musts result in (severely) underestimated (“too young”) divergence age estimates.
Wrongly assigned/misinterpreted fossils naturally may result in fundamentally wrong estimates (Fig. 3).
Examples of odd, meaningless or even methodologically flawed dating estimates – related to lack of proper age constraints or use of poorly understood molecular data (problematic topologies) – can be found throughout literature, in low- to high-impact journals. The most intriguing example for me was a paper by Larson-Johnson (2016) on rate shifts in Fagales, published last year in New Phytologist, a mid-high-tier, very prestigious journal. Apparently, the paper was waved through with the blessing of high-profile scientist (see Acknowledgments) but without any proper peer review. The node dating is fundamentally flawed, partly because the author uses (the wrong) fossils as absolute age constraints. Overall, it seems the constraints were just filtered from the comprehensive supplement of a not much better paper on Fagales – regarding the purported dating results – published in 2012 in Systematic Biology, a high-tier, also confidentially peer-reviewed journal (Sauquet et al. 2012; the methodology is fine, but the data and tested scenarios are not). In case you are interested in more details, see the Supplemental Files S2, S3 (included in this archive) of Grímsson et al. (2016), and our discussion.
Furthermore, the palaeontologist may have a good idea whether a fossil represents a stem or crown taxon of a lineage. This can help to get more useful estimates, even if it means to bend the rules of node dating. The reason lies in the somewhat different definition of stem and crown when using Hennig’s phylogenetic classification or cladistics as used in molecular dating (Fig. 4).
PDF], used by Sauquet et al. (2012) – methodologically correct – as much too young minimum stem (root) age constraint for the lineage leading to the beech trees, Fagus. Regarding its morphology and the fossil record in general, the fossil is more likely to inform a sensible crown age of the (modern) genus (Denk & Grimm 2009; Grímsson et al. 2016; Renner et al. 2016).
Lineage sampling and topological ambiguity
Something often overlooked in node dating studies, and their interpretation, is lineage sampling and the bias because of the used topology. Let’s assume that we don’t have any data on C and D in our tree, and only A-type/-similar fossils (Fig. 5). There is no genus crown anymore, just a single taxon. Following the node dating rules, A* or S1 can only inform a minimum stem (root) age of the genus (represented by A), which will be too young.As soon as we add C or D to the dataset, another species of the genus, A* does inform the stem (root) age of lineage A, i.e. one node up, and can provide us with quite a sensible age prior (fix point; Fig. 3). However, if our inferred tree resolves C as sister of A but not D, we'll assign A* to an artificial, too young node.
Fig. 6 A partly wrong tree (just shifted the genus' root); now A* constrains the MRCA of A and C, but not anymore of all modern species A, C, and D as before. |
A few real-world examples illustrating these issues.
Liu et al. (2014) argued that an Eocene Alnus fossil is a member of one of the subgenera (subgenus Alnus), and concluded that the Alnus subgenera were already established by that time, i.e. crown group radiation started, contrasting our dating estimates inferring a 15 Ma younger crown age (Grimm & Renner 2013). But in fact, there’s no conflict, as our data set (harvested from gene banks) only included members of subgenus Alnus – as it turned out later. Hence, our crown age only refers to the subgenus Alnus, but not the genus Alnus itself, and stands unchallenged.
Not only the Fagales rate shifts estimated by Larson-Johnson (2016) are based on a pretty wrong topology, in particular when it comes to intra-family relationships (poor genetic data selection). Similar problems can be found in the all-Fagales studies of Xing et al. (2014) and Xiang et al. (2014). Although all three studies use partly the same fossils to constrain node ages – fossils also included by Sauquet et al. (2012) – they place them at different nodes (see Fig. S2-1 in the supplement to Grímsson et al., 2016). Hence, the same fossils are used to inform the age of (substantially) different MRCAs, but all in-line with node dating procedure! Sauquet et al. and Larson-Johnson eliminated some of the topological ambiguity imminent from their data sets by leaving out a couple of genera. Thus, assigning fossil-based age constraints to, actually, too deep nodes in their tree, similar to what I depict in Fig. 5.
If you must do node dating…
The fossilized birth-death dating (FBD; Heath, Huelsenbeck & Stadler 2014) is clearly superior (in principle and practise) and if you have really good morphological data, total-evidence dating (Ronquist et al. 2012) may be an option (not for plants, I'm afraid). Nonetheless, classic node dating may be inevitable in many cases. So, here’s some advice what to do (and what peers should ask when judging a node-dating paper):- Contact a palaeontologist who worked on the group from which you select your dating constraints and has an overview about fossil record. (If you’re lucky, you get a full fossil record that allows you to do FBD.)
- If nobody is available, many modern groups are still poorly studied regarding their fossil records and palaeontological knowledge is somewhat dying out (at least the number of palaeobotanists and positions is steadily declining), make a proper literature search (i.e. checking original palaeontological literature, not other dating papers), and put up a table listing the fossil records. Provide that table as supplement (open data), so others can elaborate on it (always a nice gesture for future research).
- Given the uncertainty about the actual positions of fossils in the phylogeny (is it a A*, B or S1?), one should play around with different constraints. Rather than using all fossils to constrain many nodes in your chronogram at once, run several chronograms each using just a single constraint and compare the results. If the different fossil constraints end up with similar estimates, they can’t be too wrong. This also serves as test to see which fossil doesn’t fit at all (maybe because they have been misinterpreted, maybe because the molecular-inferred topology has some issues). But if you use them up in one single run, you have no control on their biases.
- Same for topologies. The era of Big Data, next-generation sequencing and (beauti)fully resolved, unambiguous phylogenies may be dawning, but most current data sets used for dating are still oligogene data. They can include (substantial) internal conflict (e.g. the data used in the all-Fagales studies mentioned above) and may be supporting more than one topological alternative; and eventually resulting in a Bayesian highest posterior probability (PP) tree that has some odd branching patterns (watch for low PP, but also branches with high PP and low bootstrap support). Make sure you have a proper tree or several of them and fix the topologies for the dating analysis rather than leave it all to the BEAST.
And always remember: Whatever the result, it’s all just minimum estimates. But, when thoughfully done, they may be close to the actual point of divergence.
