The paper (Wanntorp et al. 2014) stumbled quite a time through the Forest of Reviews. The first journal we tried (J. Biogeogr.) was uninterested, mainly because we did not include an explicit biogeographic inference and a molecular dating. Which would have been pretty much chasing ghosts and generating house-numbers under the given circumstances. Nevertheless, since chasing ghosts using house-number is not uncommon in (plant) biogeography, they pointed it out as a deficit important enough to reject the paper (among other things, naturally). So we added those, and the next journal (BMC Evol. Biol.) allowed us two rounds, but finally turned down the paper because ... well, under the line ... the added biogeographic inference and molecular dating didn't suit the second journal's reviewers. So, our brain-and-pain child was send a third time into the Forest, finally finding a journal, Taxon, that would publish the study. With a catch. Because going “too far” for their readership, the editor advised us (for non-scientists: editorial advise in confidential peer review means do it or find another journal) to drop all my nice network figures. Figures and reconstructions that were considered “a nice touch” by a reviewer judging our paper before, and a reason that editor allowed a revision. To protest, I was about to drop out, too. I did not, and still regret it to make a point. Had invested too much time, work, and passion into this piece (like the first author); I couldn't. But when they tried this a second time several years later, in case of a paper on another asclepid, Cynanchum (Khanum et al. 2016), I retreated from the co-authorship. Which had the positive effect that my co-authors were allowed publishing our bootstrap consensus networks. A sacrifice worth it (compare both papers).
The background: complex genetic differentiation patternsThe genetic data of my first author (comprising the nuclear-encoded 5'-ETS, ITS1 and ITS2 spacers of the 35S rDNA; and the plastid intergenic trnH-psbA and trnT-trnL spacer) were complex, why she ended up with me as a co-author. But, in the complexity was structure, just waiting to be distilled. Several main clades emerged, supported by nuclear and plastid gene regions, and one could bring them in a quite sensible geographic framework (per hand-and-eye, not really per inference). Because of the signal issues, a full analysis was done: single-gene trees, trees for nuclear vs. plastid data, combined analysis, analysis with restricted taxon set and outgroup, and bootstrap support consensus networks highlighted signal issues, and identified rogues – the taxa jumping across the tree – and the occasional local incongruences. The only figure that passed through final editorial scrutiny was a tanglegram.
And a (quite ordinary) tree, naturally, based on the combined data and with some critical low supported branches. The electronic supplement at the journal's homepage just includes the voucher information and a tabulation of the figure above. An archive (1.6 MB) including the tree inference and bootstrapping results but not the primary data (first author's call, not mine), too, can be found here.
Why bootstrap consensus networks should be obligatoryIn case of complex genetic data, trees may be
- biased, showing a topology that prefers branches over better supported, competing alternatives
- incomprehensive, showing only the support for the branch in the tree, it cannot be assessed whether a low support is due to a conflicting alternative, or weak but unambiguous signal
Bootstrap consensus networks are rarely seen in phylogenetic literature and may be quite alien to reviewers, editors and – later – readers. Hence, the figure above was preceeded by a figure that included portions of the (full) bootstrap consensus networks at the relevant branches in the combined tree.
The bootstrap consensus networks can identify how rogues induce topological ambiguity. Moreover, they show whether low branch support (e.g. BS <50) relates
- only to the lack of decisive signal – all alternatives with very low to diminishing support, or
- conflicting signal – two, three competing equally valid alternatives sharing ample (low) support (usually the case here, when it comes to inter-lineage relationships).
- just two-third of the data (segregating sites) supporting this split, the remainder being uninformative, or
- one third of the data support a conflicting split?
Why using median-networks, a population genetic method, for interspecies relationshipsThe level of intra-lineage divergence in each gene region Hoya was quite low. Furthermore, I tabulated the sequences (included in the original supplement, but not in the final "edited" version) and checked their mutation patterns. Mutation patterns that revealed another level of ambiguous signal: ancestral sequence variants coexisting with obviously derived ones.
Lack of divergence can be a problem for probabilistic tree inferences, since the likelihood surface of the tree space – the magical plane that decides whether one tree is better than another – is quite flat, more like a lake than a mountain chain. Under these conditions Bayesian inference falters (posterior probabilities, PP << 1.00), and ML tree-inferences struggle, but parsimony can work. The problem with parsimony tree-building was in case of our data the amount of stochastic signal, its complexity. And all tree inferences fall short in the face of actual ancestor-descendant relationships: an ancestor would need to be placed at an internal node, not at a tip [first phylogenetic trees, stacking neighbour-nets, why networks are inevitable, also in cladistics].
Countering both issues, median networks are a brilliant tool:
- They include all parsimony trees, the parsimonious solutions that can explain the data.
- In contrast to ordinary parsimony trees, which treat all taxa as tips, median networks allow placing a taxon at an internal node, the median.
Noticing the resolution issues visible from the single-gene bootstrap consensus networks (towards the tips of the trees) and the quite obvious mutation patterns emerging from the data tabulation, I decided using median-joining networks as vehicle to
- make a call about ancestry and derivedness of sequence variants, and see
- how retention of ancestral variants in one or several gene regions affects the tree inference and explains conflicting bootstrap support patterns.
Despite not welcomed by the editors, the network-based findings (largely undocumented so far) still play a role in the published paper. They were the main basis for the discussed hypothesis(-es).
Confidential peer review hinders out-of-the-box-thinkingMedian-joining networks are parsimony-based graphs; and that I had to omit them did not lack irony. One of the recurrent reviewer critiques was that we did not include a parsimony analysis to back up our maximum likelihood-based analysis framework. Grace to the Impermeable Fog shrouding the Forest of Reviewers, peers (rarer editors) judging the quality of a paper are free to act like imbeciles (on a case-to-case basis). Only authors and editors read their reports. Authors, who are not only bound by confidentiality, but have to be friendly and wilful to not lose the chance to publish their paper. Facing editors who may think very highly of their (anonymous to the authors) catholic (all-encompassing and obviously infallible) peers, and not rarely lack competence in the same fields. Naturally, none of Taxon's editors, a systematic-botanical journal, or our final reviewers, can be considered an expert of phylogenetic inference and comprehensive analysis of non-trivial data patterns. Editing a low/mid-tier journal can be tedious, the editor may be just happy having found anyone reporting on the submitted work (which can be difficult enough, in particular for drafts roaming in the Forest of Reviews for quite some time). So that person should be kept happy for the next time(s) his/her services are needed. But since the review process is confidential, the reader (or any 3rd party) cannot know about the fights and fiddles surrounding the publication of a paper. And thanks to Taxon's editorial wisdom, which shields their readers from reconstructions going "too far", the more generally interested reader may find it hard to understand where the ideas of the authors come from. Because the main reconstructions were stripped from the main paper (and supplement) during review. Like in this case.
Open data and graphics (free to re-use)The lost figures and supplementary data tables have been uploaded to figshare (including the primary data files) and can be referenced as:
- Grimm GW. 2017. Over-the-edge tables and reconstructions linked to the slimmed-down paper of Wanntorp et al. (2014), published in Taxon. figshare https://doi.org/10.6084/m9.figshare.5688181
Further links and references
- Grimm, GW. Using consensus networks to understand poor roots. The Genealogical World of Phylogenetic Networks (ed. by D. Morrison). http://phylonetworks.blogspot.fr/2017/12/using-consensus-networks-to-understand.html
- Khanum R, Surveswaran S, Meve U, Liede-Schumann S. 2016. Cynanchum (Apocynaceae: Asclepiadoideae): A pantropical Asclepiadoid genus revisited. Taxon 65:467–486. — Official link; — electronic supplement archive @ palaeogrimm.org/data (may differ slightly from the finally accepted version).
- Morrison D. 2014. Some things you probably don't know about the bootstrap. Genealogical World of Networks. http://phylonetworks.blogspot.fr/2014/04/some-things-you-probably-dont-know.html
- Wanntorp L, Grudinski M, Forster PI, Muellner-Riehl AN, Grimm GW. 2014. Wax plants (Hoya, Apocynaceae) evolution: epiphytism drives successful radiation. Taxon 63:89–102. — Official link; — electronic supplement archive @ palaegrimm.org/data (1.6MB); — PDF with the original set of figures (including full legends)
- Network 5 – free network software by Fluxus to handle data and compute median, median-joining and reduced median networks: http://www.fluxus-engineering.com/sharenet.htm
- RAxML 8 – the current standard of maximum likelihood analysis, ideal to run a full set of analyses (tree inference + bootstrapping using a batch/shell file) on small data sets like this one (also standalone; no need for a processor grid)
- SplitsTree 4 – the free and classic software by Huson and colloborators to generate consensus networks and prepare them for publication