Translate

How to interpret bootstrap values?

A search led me to a question on ResearchGate (RG) issued five years ago: How can I interpret bootstrap values on phylogenetic trees built with maximum likelihood? Quite a bunch people answered it, but, to my mind, only provided easy answers, not the critical ones.

When you read phylogenetic literature, lots of papers have the very same deficit. The researchers put more or less or a lot of effort in compiling their data, and except for those publishing in Cladistics (2016's #ParsimonyGate uproar) and similar out-of-age journals, try to use up to date phylogenetic methods.

Well. Tree-inference methods.

And then try to establish support for the found tree, just to write something along these lines:
Clade X received high support, however, clade Y was only moderately supported. The position of clade Z is unresolved.
Reconstruction-wise often trivial, and text-wise pointless: why do we have to write in Results what is (should be) obvious from the according figure showing the tree and the branch support? [But don't, and the Mighty Beasts populating the Forest of Review, will get a heart attack!]

And not satisfying from an evolutionary perspective. Imagine Darwin and Wallace, the forefathers of evolution, would have discarded everything as "unresolved" just because it was not very obvious from the data they assembled or could access. What distinguishes researchers (in German: Forscherforsch is a probably not related but fitting attribute) from (many) scientists (in German: Wissenschaftler) is that the former dig where it is dark, try to glimpse into the unknown, while the latter make a career studying what is already known (or obvious). A phylogeneticist, a particular breed of scientists, can publish a sentence like "position of clade Z is unresolved", it's true and risk-free (with the Mighty Beasts). A researcher will try to find out, why it is unresolved (and be eaten by the Mighty Beasts).

Translations provided by LEO for the two German words Forscher and Wissenschaftler. Note the small intersection; how much of a Forscher or Wissenschaftler are you?

A common measure to estimate support for the tree is the non-parametric bootstrapping, a true classic introduced by Joe Felsenstein [Wikipedia/Institution page] in 1985. Over a decade ago, I came into touch with Alexandros Stamatakis [GoogleScholar], the creator of RAxML, one of the (now) most common programmes to infer trees and establish branch(!) support (we rarely do "node support" in phylogenetic tree inference, despite what one can read in so many papers) including non-parametric bootstrapping. He was back then in Lausanne, I visited him and I remember we talked about what is bootstrap support; what does it show? And his (still valid) answer was: every biologist, well the few that ever give it a thought, seems to have a unique opinion on that.

If no-one really has a concept of what it is, how can we interpret it?

Luckily, biology is one of the softest natural sciences, and we often interpret stuff that we either don't understand at all or only superficially know about. And when it comes to bootstrap support the scientific standard is everything above BS > 70 is "well-supported", and BS between 50 and 70 can be "moderately supported". Quite some people provided this as an answer, and the questioner was happy with it. Which is, of course, complete nonsense: why should BS of 49 be much different from 51, and 71 "well-supported" but 69 "moderately"? It all depends on what the other 29%–51+% of the bootstrap replicates show. An example.

First, the standard tree depiction (Please stop using cladograms!):

The Fagales subtree (the plants closely or remotely related to beech, chestnut, walnut, alder, birch, oak etc.) as seen on Stevens' nice Angiosperm Phylogeny Website (great information resource aside from its tree-thinking)
One branch, denoted by the asterisk, has moderate support. The data reality, however, looks more like this:

The moderate support (blue 'edge bundle', this would be a branch in a potential tree) masks that some data prefers unanimously a different alternative (red edge bundle, a branch in an alternative tree). More on this can be found here.

The situation is not "moderately resolved" or "unresolved", it's clear we have two alternatives to deal with.

So, here's my RG answer (with a few graphics and more links)
[Vanity side-note: I already got a mail pointing me to a new RG "achievement" that I reached a new "milestone" to tweet because it got five recommendations, funny how tiny "mile-"stones can be...]


It may be a bit late, but I just accidently stepped over this thread, and have to notice that most (all) answers above are profoundly influenced by tree-thinking and common practise, rather than addressing the very nature of ambiguous bootstrap (BS) support, i.e. why do we get BS < 100 for a certain branch. Also, the question is still a good one, noticing the many errors in peer-reviewed phylogenetic papers across all biological disciplines (some of which surface in the comments so far).

A good starting read for the BS-unaware is this 2014 post by David Morrison on the Genealogical World of Phylogenetic Networks:


For those looking for the hard stuff, Felsenstein's (2004) book is still a good read (it's one of the few scientific books I ever bought, and it was worth it):

Felsenstein J. 2004. Inferring Phylogenies. Sunderland, MA, U.S.A.: Sinauer Associates Inc. ISBN 978-0-87893-177-4

The quality of a book on phylogenetics is obvious from the cover. Those showing a cladogram are for the bin, evolution has more than one dimension!

Starting with a common error in some responses: We never estimate "node support" with currently used tree-inference programmes, but always branch-support! Only by rooting the tree, a post-inference graphical modification, e.g. using what we consider to be the outgroup, we can interpret the estimated branch support as a support for one of the two nodes, the root-distal one, terminating each inferred branch (the so-called "internode"; all commonly used tree-inference optimise unrooted trees).

How branch support becomes "node support", just graphics, no mathematics. The BS analysis gave BS = 99 for the purple internode, i.e. the taxon bipartition A + B | C + D occured in 99% of all bootstrap pseudo-replicate trees. When we root the tree with D, we interpret this result as unambiguous support for a sister relationship of A and B.

We bootstrap the matrix and infer a tree based on this pseudo-replicate of our entire data, we repeat this process and then count how often a certain phylogenetic split, a taxon bipartition, occurs in the pseudo-replicate tree sample. Note that a phylogenetic tree is a one-dimensional graph put together by a series of such taxon bipartitions, and the standard BS approach is to optimise a tree, and then map the support from the BS sample on that tree. RAxML, for instance, has also an option to plot branch support values against each other, and to map different tree samples on tree not inferred simultaneously, e.g. you can read in a Bayesian tree sample and map the values on the ML tree; or read in the highest-probability Bayesian tree and map the ML-BS support on that tree.


Second, BS support or 'bootstrap percentages' are not probabilities, they measure the robustness of character support. This distinguishes them from Bayesian-estimated posterior probabilities (and maybe the reason we write: BS = 50 but PP = 0.5). However, depending on the stringency (coherence) of the signal in a matrix, BS support may converge to the actual probability. For instance, if A is indeed sister of B, i.e. both derive from a unique common ancestor, but only 70% of the segregating (variable) sites support A as sister of B showing a split A + B | C + D and the rest don't resolve anything, the BS can be ~ 70, but the PP for the split will be 1.00.

The (in)difference of optimality criteria and branch support measures, and in comparison to character support (from Networks and bootstraps as tree support criteria).


Third, and most importantly, you have to check out why the BS < 100. Let's say a branch in our tree that supports that A is sister of B, has a BS = 70, the widely used threshold for moderate vs. good BS support. The taxon bipartition we search for in the BS sample is:

A + B | all others

With perfect data, a BS support of 70 for this split means that 70% of the variable sites support A as sister of B (and assuming the root is outside A + B). In reality (see example above), the number of segregating sites supporting the split may be higher or substantially lower (it also depends on how we infer the BS replicate trees for our sample).

Now the crucial question is: What do the other 30% show?
  • They either do not resolve this particular bipartition at all (A and/or B are part of a soft polytomy) or produce a great variety of random bipartitions (e.g A but not B sister to C, D, E, F, G, H ...) all of which have frequencies of converging to 0 in the BS pseudo-replicate sample – this means: the sister relationship of A and B is supported by not perfect (somewhat faint) but coherent signal in the matrix.
  • They consistently support a conflicting, alternative topology that places A as sister to C, which accordingly receives BS <= 30 — this means: part of your data prefers A as sister to B, but the other significant part prefers A as sister to C, you have internal signal conflict! And your tree only shows a part of the possible truth.
The lost 33%. Wether a BS = 66 is sufficient or not to say that D and E are sisters (assuming the root is outside this subtree), depends only on whether there is a single or many random alteratives to it.

Internal signal conflict, also called tree-incompatible signal, can have various reasons. For instance (some of them occur in conjunction):
  • 70% of your segregating sites are from maternally inherited genomes (plastome, mitochondriome), which are stronger affected by genetic drift and biogeography; 30% are from the biparentally inherited nucleome.
  • Incomplete lineage sorting: part of your genes/data show different aspects of the true tree (or the coalescent).
  • Combination of data with strongly differing evolutionary rates: the fast-evolving traits, genes may get the leaves right, but will be increasingly wrong towards the roots (saturation effects, branching artefacts); slow-evolving, conserved patterns can better resolve deep relationships but will not provide any support (or wrong support, when crucial data are missing due to sequencing gaps; ML is less vulnerable to this than MP or distance-based approaches) for the tip branches.
  • All processes of evolutionary reticulation, hybridisation, introgression, lateral gene transfer etc. can express themselves in split BS support patterns. And exception is complete takeover by unilateral introgression; in that case you will get unambiguous support for a relationship showing only half the truth, e.g. finding the wrong genetic signature in a morphologically unambiguous individual (we had this a couple of times in plants).
  • In the special application of palaeontology (morphological data sets including or exclusively compiled for extant organisms) actual ancestor-descendant relationships, or, in general, the overall level of primitiveness/derivedness. For instance: if A is the ancestor of B and C, and the matrix perfectly reflects this situation, than both equally correct and wrong alternatives (A + B) + C and (A + C) + B will converge to BS = 50. In reality, the more primitive A is and the more derived B and C are, a third alternative A + (B + C) will also take its share. The faintness of coherent signal and actual ancestor-descendant relationships (old vs. young fossils vs. modern-day relatives) is the reason why palaeontological phylogenetic studies never get high levels of BS support (and if they do, it's either trivial relationships, when A is identical to B but different from anything else, BS ~ 100, or branching artefacts such as long-branch attraction)
So if you have BS << 100, you need to explore the cause for this! Which is very easy to do using bootstrap consensus networks. See e.g. this post:


You just read in your tree sample into SplitsTree (detailed walk-through in the post); for the R-affine, Klaus Schliep's PHANGORN library for R now includes this option, too, and some some other handy functions to transfer information between trees and networks, see this (open-access paper; there are vignettes introducing the new functions)

Schliep K, Potts AJ, Morrison DA, Grimm GW. 2017. Intertwining phylogenetic trees and networks. Methods in Ecology and Evolution DOI:10.1111/2041-210X.12760.

A lot of examples for why we should stop ignoring the reason a BS support becomes ambiguous can be found at the Genealogical World of Phylogenetic Networks, e.g. the posts tagged with the labels Consensus network and EDA (exploratory data analysis). [Side-note addition: You can also of course take nearly every one of my phylogenetic papers, I quickly stopped with showing only trees.]

BS consensus network for squids showing the reason for BS < 100 seen in the originally published tree and a potential data bias (the Octopodida core clade, a split triggered by nigh-on different 12S rDNA sequences)
BS consensus network for the loranth subtree of the Santalales, considered to be resolved by the nigh-expert on the group (Using consensus networks to understand poor roots and Trivial but illogical – reconstructing the biogeographic history of the Loranthaceae).

PS Note that all internal data conflict much easier find a reflection in the BS support, the Bayesian PP will easily tilt to one alternative (because this is what the MCMCMC chain is trained for, to find the tree that best explains all the data). There's a much-overlooked brilliant paper regarding split support values and how BS and PP behave when the data prefers more than one tree. 

Zander RH. 2004. Minimal values of reliability of Bootstrap and Jackknife proportions, Decay index, and Bayesian posterior probability. PhyloInformatics 2:1-13. See also: Two papers you may want to read before inferring trees from morphological (or other) data


Post-scriptum: the question brought me to another interesting question, What's the difference between neighbor joining, maximum likelihood, maximum parsimony, and Bayesian inference? which I couldn't help to answer, too.

2 comments:

  1. Rob Lanfear has a nice blog entry re interpreting bootstrap support in light of additional statistics generated by IQTree: http://www.robertlanfear.com/blog/files/concordance_factors.html

    ReplyDelete
    Replies
    1. Thanks for sharing the link, a long read but worth it. Concordance statistics give a nice touch to relativate classic branch support values. It is indeed important to realise that not everything with a BS = 100 or PP = 1.0 is equally supported by actual data; and I imagine, they are as much underused as are networks.

      However, when digging for gene incongruence, going via a consensus network of the gene trees is a simple option, too. SplitsTree can summarise tree samples using different measures including averaging or summarising branch lengths in the individual trees (see also the cited Schliep et al. 2017 for transferring information between trees and networks). Because in the end, we not only want to know how many genes/data patterns in our multigene data support a certain branch, but what are the alternatives supported by the remaining genes/data patterns that are not captured by the combined tree. Towards that end, the TC/IC values Salichos, Stamatakis & Rokas 2014 (Open Access) may give more clues where to look as they take into account all non-trivial bipartitions.
      No matter what measure you use (networks, concordance/incongruence statistics implemented in IQTree or RAxML), we should always look a bit behind the branch support, including well-supported branches.

      PS Comments accept basic html code, for embedding a link it'd be:
      [a href="LINKURL"]LINKTEXT[/a] (replace [] by <>)

      Delete

Enter your comment ...