Res.I.P. – an unprofessional science (and other things) blog: how-to-analyse

Showing posts with label how-to-analyse. Show all posts

ML, MP, NJ – what's the difference?

In 2015, somebody on ResearchGate posted the following question: “What's the difference between neighbor joining, maximum likelihood, maximum parsimony, and Bayesian inference?” Many answered, a few were ok but a lot repeated common misconceptions. A necessary correction and some further remarks.

Scientia-ex-machina: explicit biogeographic inferences and the phylogenomic age

The nice thing about huge datasets is that they can give quick results, often trivial to interpret. In phylogenomics: a fully resolved, unambigously supported phylogenetic tree. The not-so-nice thing is that downstream analyses using these fully resolved trees, such as ancestral area analyses, may be utter nonsense because the experimental set-up was fundamentally flawed to start with. A post-review of Areces-Berazain et al. (2021), including the results from Li et al. (2019) and Yu et al. (2022).

A fully resolved, and perfectly misleading, species tree

The ultimate promise of phylogenomics is to get a fully resolved species tree: a tree, where the individuals finely sort per species, and where all branches, especially the deepest ones, have high or, better, unambiguous support. A look behind such a tree, Jiang et al.'s (2021) Tree of Beeches.

How to pimp up a palaeobotanical monograph

I got a request via ResearchGate to give feedback to a recently published palaeobotanical monograph. I'd love to, but I can't, really; I'm simply not qualified. But I can give some tips how to enrich a description of a palaeoflora to put palaeobotany in a better light.

Inferring a ML tree with 12000 (or more) virus genomes

Somebody on the RAxML group posted this as a bold question, asking for tips how to speed up the analysis. Since I recently looked at virus genomes (infamous SARS group) myself, I have some ideas for this.

Now that RAxML includes all models – a practical tip

The new, faster and (meanwhile) very option-rich version of RAxML, RAxML-NG provides the full plethora of nucleotide substitution models, which can be confusing to the normal user. Hence, a practical tip based on my experiences with very different sets of nucleotide data (and from very different organisms).

There's no need to do what you can't

Modern science thrives on pretention. We can't just publish something interesting, we always feel compelled arguing why it's important and stress its ground-breaking novelty. On the other hand, everyone can use computers, and those computers can do fancy analyses provided you have some data. And they always get it right, so why should editors and reviewers bother about the results?

What you should show in a palaeophylogenetic study

Far the most palaeophylogenetic studies rely exclusively on tree-inference as methodological framework. Thus, ignoring the fundamental properties over the underlying data: matrices that provide few tree-like signals. A recommendation what to show (and why).

How did I do it – a short guide to a nice graph

At the end of the 20^th and into the new 21^st century, phylogenies have been largely reduced to stick graphs, often quite unappealing ones. In the papers I co-authored, I always tried to enhance the graphics, and I have not rarely been asked how I do it. So here's my protocol for a little basic tree-and-networks magic.

The most common errors regarding node dating

Many molecular dating studies rely on a few, sometimes poorly understood fossils as age priors to constrain nodes heights (ages) in an ultrametric tree. But do the authors (peers, editors, and – ultimately – readers) know what they do/has been done? Maybe, maybe not; in any case reading the papers can be confusing. In this post, I'll try to give a quick step-in.