The weekly newsletter pointed me to a fresh article in my favourite journal, PeerJ, dealing with a group that I studied some time ago: the Betulaceae.
Yang et al. (2019) assembled three new Betulaceae plastomes. Like all Fagales, well most extra-tropical common trees, the alders (Alnus), birches (Betula), hazels (Corylus), and hornbeams (Carpinus) and its close relatives (Ostrya and Ostryopsis) are severely understudied, so every new plastomes adds to the common knowledge and is a step forward worth a publication. And they did (as far as I can tell) a nice job in describing the plastomes and their basic properties.
However, the authors felt compelled to enrich their paper with a mixture of false representation of earlier studies and completely useless reconstructions.
Long time ago (molecular systematically speaking)
In 2005, Forest et al., published (in a low-impact journal, a lot of ground-breaking research can be found in journals with poor impact factors) an interesting paper (still, nearly 15 year later) on intra- and intergeneric relationships in Betulaceae. In contrast to the all-Fagales tree published a year earlier (Li et al. 2004) by leading figures in the angiosperm phylogeny business, which placed Alnus as sister to the remainder, followed by Betula, they found that the traditional two subfamilies (Alnus + Betula → Betuloideae vs. Coryloideae: Corylus, Ostryopsis, Ostrya and Carpinus) are supported by the divergence patterns in nuclear non-coding gene regions, the much-sequenced (in the old days) ITS region and the (much less sequenced, but usually much more divergent) 5S intergenic spacer.
The Li et al. (2004) paper stroke a nerve of mine right after I found it, and so I re-analysed their data [Never published draft-PDF] to find a couple of interesting (to me) things. Such as that Alnus is likely misplaced in the tree (and with unambiguous support) due to a data artefact, and that the highly conserved nuclear gene included (the 18S rDNA) doesn't really match the plastid gene regions (three genes from high- to medium-conserved, one low-divergent non-coding region), while the only mitochondrial gene (added to make it an analysis covering all three genomes) hardly provided any useful signal.
Later, and more by accident than planned, this led to our re-analysis of all data on Betulaceae (Grimm & Renner 2013, also submitted to a low impact journal, it was a fun, pretty harmless study, nothing world-changing, however, still sedimented for some time with the editors and reviewers). We showed that also a plastid marker set can fall in line with Forest et al.'s hypothesis (regarding the inter-generic and main intra-generic relationships; the ML tree misplaced the sisterclades-informed root, but support for ingroup/outgroup split was low; the dated Bayesian tree got it right) and did some dating experiments.
Yang et al. (2019) did, of course, something much better, having assembled three more complete plastomes:
Previous phylogenetic conclusions on the family Betulaceae were based on either morphological characters or traditional single loci, which may indicate some limitations.Or, to put it more boldly:
Nevertheless, all the above taxonomic and phylogenetic conclusions[our 2013 and Forest et al.'s paper are cited, together with long outdated, pre-2000, studies, from the dawn of the molecular era] are inferred from unreliable and dynamic morphological features or DNA fragments with limited polymorphic information loci (e.g., rbcL, matK, and ITS), which may inevitably bias the phylogenetic reference (Philippe et al., 2011)Here's a visual example for how "unreliable" morphological features are, the size of their balls, sorry, nuts.
|Seed size in Betulaceae or phylogeny for dummies (tree shows the currently accepted view, see e.g. Stevens' Angiosperm Phylogeny Website, as usual, best place to start for those that have no idea about Betulaceae). Seed images are from here (lacks Ostryopsis, which fit with those of their siblings, Carpinus and Ostrya)|
And as proof, they come up with this tree based on ...
... chloroplast genome[s] contain[ing] rich polymorphism information, which is very suitable for phylogenetic studies.
|Yang et al.'s tree based on complete plastome sequence data.|
Quite a step forward compared to our (2013) preferred tree, using only "DNA fragments" with "limited polymorphic information" ...
... or the one by Forest et al. (2005), using nuclear DNA "fragments", that, likewise, "inevitably bias the phylogenetic inference".
Yes, it is the same tree!
Just prune them down to the species Yang et al. could include. Well, there are some, not unimportant, differences towards the tips of ours and Forest et al.'s tree, which would be worth exploring using complete plastomes. For instance, nuclear data supports a reciprocal monophyly of Ostrya (arrows) and Carpinus, while plastid data infers an Ostrya grade. Just keep this in mind for later.
The much-debated (not really) question highlighted by the authors, that we don't know 100% where to put Ostryopsis (actually, we do, see the trees above; it was long included in Ostrya; only the signal from the genes is not that clear and some may get it wrong), remains unresolved.
The phylogenetic position of Ostryopsis davidiana was controversial among different datasets.Apparently still is:
If the paper would have been properly reviewed, the reviewers would have suggested to downtone the novelity of the tree and just state the obvious: so far (the species sample is very limited), the complete plastome data confirms earlier studies using "DNA fragments with limitied polymorphic information loci" and the accepted view [Wikipedia/APW; both with no reference, yet, to the Yang et al. paper] on inter-generic relationships. And that the complete plastomes point to intra-plastome signal issues that could be interesting to explore further.
This would have been more than enough for a nice, new paper — the main benefit of complete plastomes is that we can dig for regions with sufficient divergence where the often-advertised standard barcode markers fail miserably, i.e. at the species level. But the authors didn't stop there. And the editor and reviewers didn't bother to stop them (don't blame them, they may have had no idea about the data, methods or organism). They added – because it's so easy to do these days – a dated tree and an ancestral area reconstruction.
The dated tree
Both Forest et al. and we did a dating (in our case, the dating was the main point of the paper, see chronogram above, but don't miss the tables and supplements). The authors' discussion is swift and doesn't even mention these earlier results (easy to do, just make a table). This makes sense because the super-duper data fails to provide anything really new on this front. The main result is (as stated in the abstract):
The divergence time between subfamily Coryloideae and Betuloideae was about 70.49 Mya, and all six extant genera were inferred to have diverged fully by the middle Oligocene.
|Yang et al.'s chronogram. The bubble-1 and -2 indicate the nodes that were constrained (fixed using a two-decimal value and a narrow normal distribution).|
Some time ago, I wrote a post on the most common errors of (node) dating, which includes some the authors made as well and the reviewers were shy to point out. In the Discussion sections, we read:
Our molecular dating analysis supported Betulaceae to be originated at the end of Cretaceous (∼70.49 Mya).Origin would be the stem age, but they mean, of course, the crown age (as stated in the abstract). Also, this is not a result but an assumption of the analysis:
Two [in reality: one] fossil constraints were used for calibration: (1) the crown age of the family Betulaceae was set to 69.95 Mya (SD = 2.0) and assigned a normal distribution (Xiang et al., 2014)In other words, the age was fixed to c. 70 Ma. The reference to Xiang et al. is funny, because this ambitious paper has some massive data issues resulting in some branching artefacts, which affect the dating (problems regarding age estimates for the Fagaceae, another family of Fagales included in the Xiang et al. and other all-Fagales studies, are discussed in detail in the Supplement File S2 to Grímsson et al. 2016b; low-impact journal but charge-free open access).
Also, No. 1 is no "fossil constraint" but a secondary dating constraint (a classic error in quick data- and method-naïve datings; everyone can use the programs but doing it right is a different thing, why I always left that part to people familiar with it): Xiang et al. used two fossil constraints in the Betulaceae neighbourhood, one for the MRCA of the Coryloideae [Corylus johnsonii, Middle Eocene, ≥48 = No. 2 of Yang et al., who, however, fixed it to "48 Mya (SD = 0.5)"] and one for the MRCA of the entire clade comprising the Betulaceae and its sister clades, the monogeneric Ticodendraceae and the Casuarinaceae based on and older Casuarinaceae fossil [Gymnostoma antiquum, Late Paleocene ≥55.8, treated as stem fossil].
To sum up, their advertised main dating result is re-using the result of Xiang et al. (2014) for the same node but having extrapolated it somehow from 64.4 (59.4–72.5) Ma (Xiang et al., 2014, table 1) to "69.95 Mya (SD = 2.0)".
- Rule (for authors and reviewers): Always look up the original literature when selecting/checking fossils to constrain node ages (I know, it's boring because it's written usually by dusty palaeobotanists).
- Rule: Never use secondary dating constraints unless you have no fossil record at all.
- MRCA Alnus-Betula we (preferred tree using mediocre DNA fragments): 60–38 Ma, Yang et al. using very suitable plastomes: "∼61.76 Mya, 95% HPD = 49.77–70.97 Mya"
- MRCA Coryloideae: 39–22 Ma vs. a constrained (node 2) "47.93 Mya, 95% HPD = 46.95–48.91 Mya"
- MRCA Ostryopsis-Ostrya-Carpinus: 34–18 Ma vs. "44.63 Mya (95% HPD=40.11–47.93 Mya)"
- MRCA Ostrya-Carpinus: 21–10 Ma vs. "26.73 Mya (95% HPD = 15.09–39.44 Mya)"
Having constrained the crown age to be around 70 Ma, they naturally got ~ 10 Ma older estimates than we got, because we constrained the root (stem) age to be at least 71 Ma and, thus, inferred a crown age of min. 63–43 Ma. Now, fossil-wise, we can be sure Betulaceae were around in the late Cretaceous but there is yet no conclusive evidence for one of the modern genera. And one thing is obvious from the tree alone (without any explicit dating): alders and birches split right after the initial radiation (hence, the failure of plastid genes to get it right: too little signal accumulated in the short time). In a rich temperate (intra-)montane Eocene flora, the famous 50 Ma old McAbee Flora, we have both of them (and pretty much the earliest ones, too) but only undifferentiated Coryloideae. According to Yang et al., Corylus, the hazelnut (biggest nuts) was already evolved and spanned, like all the others, the entire Northern Hemisphere. Older clocks are not uncommon in general (the angiosperm dating are famous for them), sometimes wrong but more often just misinterpreted (node dating provides us with minimum estimates, i.e. the real divergence happened earlier). But they are very uncommon, when being so close to the constrained node(s). Then it's either a bad constraint (informing the wrong node) or unfortunate data (misleading/ -estimated branch length patterns).
The MRCA on the bottom of the list, the MRCA of Ostrya-Carpinus, brings us to the next bit of unfiltered bad science waved through by either very well-meaning or simply incompetent reviewers (after a short interlude).
Due to the lack of leaves (unrepresentative species sample), Yang et al. cannot estimate the minimum age of the MRCA of Ostrya and Carpinus, but only of the MRCA of East Asian Ostrya and Carpinus. The first diverging plastid lineage in this clade are whoever the North American Ostrya: the genus doesn't form a clade in a plastid tree, but a grade. Much in contrast to nuclear data. A summary (data used in our 2013 paper):
Two things to note here: the concatenated nuclear data, here summarised by genus-consensus sequences (hence, little data gaps), are much more divergent than the concatenated and consensed plastid data covering much more base pairs and gene regions. This classic plastid data prefers a Betuloideae grade — a data/ branching artefact that can be overcome by changing to a dating framework (Grimm & Renner 2013) or, apparently, using complete plastomes (Yang et al. 2019). When we combine both, this conflicting signal overprints the one in the nuclear partitions (Li et al. 2004) unless you stick to the most conservative bits (matK, rbcL, trnL/LF region, and atpB-rbcL spacer; Xiang et al.'s Betulaceae subtree, with ambiguous support). A point for Yang et al., but effectively missing from the paper because they didn't bother about reading Forest et al.'s (or our) paper before they trash-binned them.
And, the plastids fail to resolve Ostrya as a clade, but the nuclear data does. The latter is not a "bias" due to the use of uninformative "DNA fragments", but a genuinely hard signal.
The 5.8 rDNA is one of the most conserved (and lineage-coherent) nuclear gene regions known and its mutations are partly structurally linked to those is the equally conserved 5' part of the 25S rDNA. They are extremely diagnostic and conserved within genera (in total, Fagales differ at 15 of the 156 basepairs and in the sequence of a 4-nt long terminal loop). In Betulaceae, the Betuloideae share one exclusive variant, as do the Coryloideae, and both differ by a single mutation in the small terminal loop (pos. 131–136 in the mature rRNA), with Ostryopsis (note its long terminal branch in the trees at the beginning) showing a further derivation.
This may strike you an odd thing to note (come on, single-base mutation, really?), but many more of these lineage-consistent patterns can be seen in both relatively conserved as well as very divergent parts of the sequenced nuclear spacers. Evolution dices, these spacers are non-coding and occur in thousands of copies, possibly even scattered across more than one loci; so, pattern consistency is a direct sign of common ancestry. (This exercise can be extended to all Fagales, including the pretty unambigous reconstruction of the ancestral 5.8 rDNAs; it is a very slow evolving gene region, only plays a bit in its 5' tail, the first four nucleotides).
Dividing Ostrya, as favoured by plastid only trees (the Americans are missing in Yang et al.'s data), would have meant the nuclear, non-coding regions of the disjunct parts would have evolved in parallel, or did not evolve at all after the N. American Ostrya split off and before Carpinus evolved. Such local incongruences between nuclear and plastid genealogies are not surprising: Plastids in plants reflect only the maternal heritage. Seeds come from the mother trees and carry the mother's haplotype, and when the seeds are big (as in the Coryloideae, Fagaceae or the Nothofagaceae), they don't travel far, and we easily end up with plastid phylogenies that are (partly) incongruent to nuclear phylogenies and to some degree decoupled from speciation processes.
In the authors own words, condoned by the expert reviewers:
Betulaceae are suggested to have originated in the late Cretaceous (∼70 Mya) in central China of East Asia (Christenhusz & Byng, 2016; Soltis et al., 2011). Due to the proximity of the Tethys Sea, this region at that time may have belonged to the Mediterranean climate which covered parts of present-day Xinjiang and Tibet until the early Tertiary period.Funny, because, far the most Betulaceae are today found in perhumid climates of the Northern Hemisphere and the oldest modern one are found at high palaeolatitudes (and -altitudes) like McAbee (the Eocene lowlands were hot subtropical to tropical till mid-high latitudes).
|Number of Betulaceae species (circle size) in main geographic regions and in relation to Köppen climate zones (picture and data can be found in SI 4 to Grimm & Renner 2013)|
Due to the limited representative species and outgroup used in our analysis, ancestral area reconstruction does not designate an exact origin region. However, we can confirm that ancestors of extant Betulaceae species were once extensively distributed in Laurasia that covered the present-day Asia, Europe, and North America, from which some species have dispersed into Central America, South America, and North Africa through different island chains.I like it when scientists show balls!
Even when they are profoundly wrong. When looking at the actual ancestral area analysis, it's obvious that this paper has not been reviewed by anyone with the slightest idea about biogeography or Betulaceae.
"Limited representative species" is a euphemism. For most genera, plastomes of only (two) Chinese species have been sequenced so far (plastome sequencing is still costly, but not science; you just need the money, machines, and operators, something China has plenty of but hard to get e.g. in Sweden). But all genera (the endemic Ostryopsis used to be part of Ostrya) have intercontinental disjunctions or span across more than one continent. Even some species do, the most extreme being the arctic Betula nana, a hardy weed.
|Distribution of Betula nana. Source: Den virtuella floran (you can change to English).|
And following the very bad example seen in this (confidentially) peer-reviewed paper (allegedly), they did of course not code the actual provenance of their material/ covered species, but just took the distribution of the entire genus. With the result that already the tips were highly ambiguous and the reconstruction for the deeper nodes completely uninformed (we refrained from an explicit biogeographic analysis in 2013, realising the limitation of the available data and species sample).
Or, in the authors' own words:
However, the origin area of the six extant genera was unclear because of the insufficient species sampling, and uncertainty of its sister group in previous studies [which they resolved using complete plastome sequences, no?]. In spite of this, we identified [???; see above graph] three major distribution areas: East Asia (A), Europe (B), and North America (C) which were speculated to break away and drift from the old Laurasia in the Paleozoic (∼57–23 Mya)Surprising result for genera which mostly occur in North America and across Eurasia. (Also the "Paleozoic" ended 250 million years ago; they probably meant 'Paleogene'; one finds such mistypers in peer-reviewed neobotanical papers with increasing frequency; again: it's easy to avoid – just print out the ICS stratigraphic chart.)
What happend?... is a valid question, but something for a less science-oriented post. The (pretty toothless) reviews can be viewed in the review process documentation. PeerJ, in contrast to many other journals, is dedicated to peer review transparency and lets the authors decide whether to release these document or not. And Yang et al. did (and probably shouldn't have because we look into something we not really want to see).
What they should have done (authors, editor and reviewers)
Just publish the plastomes, for God's sake! What happened to one step at a time? To publish a nice paper, the reviewers and editor should have told the authors to just drop the blue-shot-dating and the just awful ancestral area reconstruction. A sad fact is that the established systematic botanical circles in the West have long neglected common trees growing around us (not exotic enough, but data-wise also not trivial; Li et al.'s 2004 analysis is still the basis of Fagales systematics, and the data are pretty crappy), so sequencing a new plastome, which only requires money and workforce, is a perfect niche for well-equipped Chinese researchers and do something the West will hardly do (or can, lacking said money and workforce).
Dating is tricky! Yang et al.'s data set could be interesting for dating on its own, probably even enough to make a paper. But it needs to be properly done and discussed. The taxon sample is small, which has the benefit that you can quickly run series of analyses and tests. The downside is, you can only focus on generic differentiation. Rather than using an odd secondary dating constraint (and another that is worn-out already), the authors could have assembled the entire fossil record from literature to try a 'fossilized birth-death' dating (would not be the first for Fagales that produced some really new estimates, and we only had two "DNA fragments"). Digging literature for fossils can be tedious and tricky, so why not ask a palaeontologist to do it – maybe just do the analysis and leave everything else to the one who knows the organism and its past record. Alternatively, the authors could have followed our 2013 example and compare results obtained using different (single) node age fossil constraints such as the first fossils of each genus (placed on the right node). Even without a palaeobotanist in the boat and being short of time, one could have found this recent open access paper by my former colleagues in Vienna, Grímsson et al. (2016a), where you have a little section on the (global) fossil record for all Betulaceae found in the Lavanttal in Austria: Alnus, Betula, Carpinus, Corylus and Ostrya (yep, we have them all in Europe in the lush Miocene; even Ostryopsis may be there but we wouldn't know, likely too similar to Ostrya). I'm sure just copying this and feeding it to the machine will get you some easy-peasy paper (not my cup-of-tea, but I have the luxury of not having to publish).
And no matter which dating is done. When you have the result, why not just map it on some palaeoglobes? Like the ones provided by Robert Scotese (all on ResearchGate) or Ron Blakey (new homepage; his wife insisted to make sure a bit of money comes out of it: now it's 100$ for the first map, and 45$ for the second; should be no problem for a group that can afford to assemble complete plastomes).
|Comparing the known fossil record of Loranthaceae (Grímsson et al., 2017, also PeerJ) to a more recent, of course, much better dating (same results) and explicit biogeographic analysis (with equally informative results than Yang et al.) by Loranthaceae nigh-experts, on the background of the past earth.|
Palaegeography needs proper sampling! You can't infer biogeographic scenarios with a handful of Chinese samples, not in general, not in Zelkova, and for sure not in such a complex group like the Betulaceae. A group where you have two genera (Betuloideae) that include ecologic opportunists (including invaders), have (many) species in all ecotones and climate zones (even those that hardly can support a tree like at all) and seeds that can travel hundreds if not thousands of miles just with the wind (and are even dispersed by snow). The larger nuts of the Coryloideae and the huge hazel-nuts (Corylus) obviously trigger a more coherent geographic sorting, but also result in incomplete (plastid) lineage sorting — the "non-monophyletic" Ostrya in plastid trees. Thus, one needs material covering all geographic provenances and major lineages within a genus only to make a first step.
Ideally, one would always complement high-resolution plastid data (maternally inherited) with nuclear data (biparentally inherited; there are no physical barriers for Betulaceae pollen). Genome sequencing requires fresh material, but if the purpose is e.g. to "confirm that ancestors of extant Betulaceae species were once extensively distributed in Laurasia that covered the present-day Asia, Europe, and North America, from which some species have dispersed into Central America, South America, and North Africa through different island chains" (just check out Scotese's or Blakey's maps for the according time slices), this material needs to be from the geographically relevant and representative species and not from local microspecies occurring, e.g., at a single mountain in south-eastern China (with an anti-Mediterranean climate).
Also, it would not be a bad idea to have more than a single individual covered per widespread species. Because, so far, there is not a single extra-tropical common tree, where the available complete plastomes are fully representative for their genera, let alone single species. At least in the few cases where more than one individual has been studied, such as in oaks, see e.g. this Frontiers in Plant Science paper and my comment to it (there seems to be a general shortage of competent reviewers for complete Fagales plastome papers), and the South American 'false beech' — see the phantastic work by Acosta & Premoli (2010) and Premoli et al. (2012) despite just using "DNA fragments"; still waiting for the Aussis and Kiwis to check this in their Nothofagaceae genera; maybe somebody down-under can get the material and send it to China to process? But thanks to the the complete plastomes, including Yang et al.'s work, it should now be very easy to divise marker sets/NGS strategies to illuminate geographic differentiation patterns within and across the Betulaceae genera (for Chinese having the money to do so, provided somebody sends them material from the rest of the World).
Postscriptum — In the unlikely case, a Chinese lab owner reads this (Google is still blocked in China, and Blogger is a Google-service) searching for the next neglected genus to sequence. Even though western phylogeneticists seem to avoid common trees like a nasty disease; there are many extremely knowledgeable field botanists, forest scientists and taxonomists, who can help with and provide interesting material from the wild (most of the interesting, well crazy, birch species are on the territories of the former Soviet Union, why not engage in some comrade nostalgy?) And even some palaeobotanists are still around. Those that I happen to meet and work with (western Eurasia, mostly) were very nice cooperation partners. They also don't have a lot alternatives, because they (for various reasons) have grown weary of their nearby molecular colleagues. Why not trying to invite the one or other and tempt him into a collaboration. Chinese resources combined with classic European expertise is likely to be a killer match.
- Acosta MC, Premoli AC. 2010. Evidence of chloroplast capture in South American Nothofagus (subgenus Nothofagus, Nothofagaceae). Molecular Phylogenetics and Evolution 54:235–242.
- Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction of the tree of live. Nature Reviews Genetics 6:361-375.
- Forest F, Savolainen V, Chase MW, Lupia R, Bruneau A, Crane PR. 2005. Teasing apart molecular- versus fossil-based error estimates when dating phylogenetic trees: a case study in the birch family (Betulaceae). Systematic Botany 30:118-133.
- Grimm GW, Renner SS. 2013. Harvesting GenBank for a Betulaceae supermatrix, and a new chronogram for the family. Botanical Journal of the Linnéan Society 172:465–477. [PDF]
- Grímsson F, Grimm GW, Meller B, Bouchal JM, Zetter R. 2016a. Combined LM and SEM study of the Middle Miocene (Sarmatian) palynoflora from the Lavanttal Basin, Austria: Part IV. Magnoliophyta 2 – Fagales to Rosales. Grana 55:101–163. [Open Access]
- Grímsson F, Grimm GW, Zetter R, Denk T. 2016b. Cretaceous and Paleogene Fagaceae from North America and Greenland: evidence for a Late Cretaceous split between Fagus and the remaining Fagaceae. Acta Palaeobotanica 56:247–305. [Open Access]
- Li R-Q, Chen Z-D, Lu A-M, Soltis DE, Soltis PS, Manos PS. 2004. Phylogenetic relationships in Fagales based on DNA sequences from three genomes. International Journal of Plant Sciences 165:311-324.
- Premoli AC, Mathiasen P, Acosta MC, Ramos VA. 2012. Phylogeographically concordant chloroplast DNA divergence in sympatric Nothofagus s.s. How deep can it be? New Phytologist 193:261–275.
- Yang Z, Wang G, Ma Q, Ma Q, Liang L, Zha T. 2019. The complete chloroplast genomes of three Betulaceae species: implications for molecular phylogeny and historical biogeography. PeerJ 7:e6320. [OpenAccess]