Elsevier's research data "not available/will be made available on request" – what will be your choice?

What do you do when authors claim something (showing nothing) that you know can't be true (because you showed otherwise)? Just request the data they used and re-analyse it to check. But this is not how it works. An example from Elsevier's Molecular Phylogenetics & Evolution.

Why I never wanted to publish in Molecular Phylogenetics & Evolution (MPE)

All journals have better and worse paper (science and the publishing process depend on humans, and except for the pope, we are fallible), but MPE was one of the first journals on my personal black list. Not because of its scientific content (there have been some very good papers in MPE), but the peer review process I experienced when submitting my very first own paper there, and what followed.

In 2004, I chose the journal as publication platform for my very first first-author paper. It dealt with a molecular phylogeny of Acer based on some 200 cloned ITS sequences covering c. 50 species and all its main intrageneric lineages (sections, series).

My first 2003/2004 Acer tree (the 'all-compat' output of MrBayes; see also Grimm 2003), utterly flawed (according to our reviewers back then). Feel free to check out plant phylogenetic papers published back then in MPE.

After four or more months of review, we got back a decision ("reject") with the anonymous experts pointing out that my analyses were rubbish – mainly Bayesian tree inference (see above; one reviewer told us to rely on maximum parsimony instead because "Bayesian analysis may find wrong trees and needs further tests") spiced with some in-detail sequence motif analyses of generally length-polymorphic and indel regions (Grimm 2003; a bit survived: Grimm et al. 2006) – and that we should seek help from someone more experienced with phylogenetic methods before trying to re-submit the paper in a "more specialised" (= lower impact) journal.

Info for non-scientists: A "reject" in scientific publishing usually just means that you have to try your luck with another journal; you can protest against such a decision, of course, and I think we did, but the editor(-in-chief)'s answer will typically be "I conferred with my fellow editors [no names provided] and stand by the decision" (once, we succeeded in forcing a paper through despite negative reviews and the handling editor's decision to "reject": Göker & Grimm 2008). Which can mean a new pair of anonymous experts judging your study, or the same people getting annoyed by having it again on the table. The Forest of Reviews is not only shrouded by the Impermeable Fog, it is a dark and deep one harbouring many different sorts of beasts.

An example for the ITS motif analysis included in the paper (the "n" used to be filled squares representing the nucleotides/site polymorphism, but I lost the proper font)

But what puzzled me and put MPE on my never-submit-there-again-list was that soon after my/our work was judged flawed and I was told that I'm needing help to do proper phylogenetics (see Denk et al. 2002, where I also did the molecular phylogenetic analyses on my own), I was asked to review two plant phylogenetic papers submitted to MPE. I wrote back that I'm honoured to act as a reviewer of such a prestiguous journal, but can't help to wonder how I managed to be upgraded from a know-nothing to a phylogenetic expert in such short time (and without having published anything in-between).

Another info to non-scientists: Many journals require two reviews, but it can be hard (or impossible) for an editor to get one. So, you send the paper to a Ph.D. or inexperienced post-doc knowing they can't afford to say no, as they may want to publish in your journal in the future. And there little risk, the young researcher will either make a very careful and constructive review (and spend a lot of time doing it) or a harmless one for the bin.

I didn't get an invitation to review a MPE paper again for quite a time. And when I got one again, a decade later, I thankfully declined for more than one reason (all of which I pointed out to the editor). [PS I got the invitation shortly after we published Chen et al. (2015) in MPE, because J. Biogeogr. didn't want it — a tip: if J. Biogeogr. turns your down your biogeographic paper with not so bad reviews, MPE will probably take it.]

Over a year later after my failed first and only submission to MPE, I happened to be recruited as a co-author on a paper on a moss genus, Isothecium, with quite complex data signals and (pseudo-)cryptic species. Our first author needed the impact that MPE provides, branding should not be but is important in science and MPE is a solid ranking mid-tier journal with a sufficient impact factor for those of us that are not booked on Nature, Science etc. Review took ages, as my first author put it "a trip to hell": we submitted December 2005, and waited full nine months for the decision (the editor being clearly annoyed with her peers, and I got the impression, with MPE's peer review in general). No big surprise, MPE usually published cladograms (standard then and still, although phylograms should be obligatory, also in molecular phylogenetic studies), and we came running fat with networks. One anonymous review was so badly done that the editor decided it's not even worth sending along. The 2nd (also anonymous) was mixed but quite comprehensive helping to streamline our manuscript. Our editor wrote "Although the reviewer expresses interest in the general subject area of the paper, the reviewer also expresses a series of reservations that preclude publication of the paper in MPE." The paper (Draper, Hedenäs & Grimm 2007), which was quite a unique (and cool) piece for the time and initiated some interest in networks in bryology in general (GoogleScholar gives 49 citations), was published nonetheless, thanks to our editor. She decided to bend a bit the journal's rules "However, if you feel that you can suitably address the concerns and issues raised by the reviewer, I would be willing to consider a revised manuscript. Also, please be advised that the revised manuscript may be subject to re-review." It was not, our revision and response letter was submitted end of September, we tryied again the road to hell and were prepared to face a few more months of re-review. But just two days later, she accepted-as-is, and I still think our editor made a good choice. 


Why you should not publish in MPE

First, it's an Elsevier journal, and as a scientist you don't want to support the Coalition for Responsible Sharing

Also, if you belong to those utopian left-ish people, publishing in an Elsevier journal equals stabbing Karl Marx (who just turned 200 and is increasingly actual again) in the back: RELX is one of the most profitable companies in the how-to-make-the-public-hand-pay-for-shareholder-profits business.
In fact, you should not work in any capacity for an Elsevier journal until they start again to put their "stakeholders" (that's the scientists doing all the work for free) not first (the shareholders must be first, naturally, scientific publishing is pure oligopolistic capitalism, folks!) but at least try to make their live easier, not more difficult.

Second, providing access to the used phylogenetic data, the basic data need to verify claims and reconstructions in a paper, is apparently deemed irresponsible sharing. Similar to the editorial board of Palaeo^3, MPE's editorial board (or its reviewers, pers. obs. 2014) doesn't enforce to publish crucial primary research data.

Information not rarely found under the heading "Research data" of a MPE paper (or other Elsevier journals)

This is also what you find under "Research data" for the recent MPE paper by Liu et al. (2018).

Their phylogenetic data, the used data matrix, could possibly be made available on request ... but only if you ask nicely the right person, namely the 4th out of seven authors.

[Note to non-scientists: in biological papers the important people are in the front or back, middle-authorships can be awarded to people who have not really done anything, or done a lot but don't need a prominent author position. A typical award-point system is 5 points for a first-authorship, or asterisk-authorship (= equal contribution), 3 points for the "senior" (last) position or corresponding author, and 1 points for any other place.]

It's ok to criticise others, but not to be criticised

The paper of Liu et al. is not a particularly bad paper, I crossed worse biogeographic papers in MPE during my active time.

Extract of Fig. 1 in Liu et al. (2018) – showing the phylogenetic framework for their study: a cladogram masking the substantially varying branch-lenghts (between in- and outgroup and within the ingroup) and a topology pretty identical to the according subtree in Su et al. (2015) using a somewhat different gene and taxon sampling. See also this post regarding the position of the outgroup-inferred root.
Technically, it's actually quite nice. They filled some blanks in the existing data on a very interesting and challenging group of plants ignored by essentially everyone else dealing in plant phylogenetics; they used up-to-date inference programmes, relied on two fossils we described last year as ingroup age constraints to re-reconstruct our dating estimates (Grímsson et al. 2017) and to avoid (much) too young ones, and finally used the BioGeoBEARS script that allows running several biogeographic inference methods at once (here: with trivial, and partly illogical results, worth a future post on its own). We didn't bother to do the latter, because based on modern situation the result must be trivial (all major clades are conspicuously sorted; Vidal-Russell & Nickrent 2007) and most fossils (pollen) that could inform past distribution have yet to be revisited using high-resolution SEM to be able to assign them to the main lineages.

That they downplayed or ignored most of what I pointed out, documented and discussed during review (one to two of the reviewers of our 2016–2017 Loranthaceae papers were co-authors of Liu et al.) in our papers regarding signal issues in the current molecular data (inevitable ingroup-outgroup long-branch-attraction and note the number of still ambiguously supported branches despite using a five-gene data set with 85% gene/taxon coverage) is also not surprising. In contrast to Liu et al., our analyses (example provided below, data and papers are open access) were not mentored and guided by the nigh-expert on the group, the paper's forth author (out of seven), hence, can be considered to be irrelevant.

The phylogenetic framework we used in our 2017 (online) pollen paper (Grímsson et al. 2018): a rooted (with Nuytsia, based on earlier outgroup-rooted trees) bootstrap consensus network showing the best-supported and further (using maximum likelihood bootstrapping) topological alternatives for the relationships within the Loranthaceae. Boxes reflect competing alternatives supported by the underlying data (here: strict genus consensus sequences based on all data available in gene banks end of 2014; same gene sample as used by Liu et al. 2018)

And given the time-constraints of peers and editors (who do not get paid for this work by Elsevier or any other publisher), one cannot expect whoever reviewed and edited Liu et al. (such information is confidential; didn't James Bond boost a Ph.D., too?) to have read two open access papers published in the year before that obviously triggered Liu et al.'s study (Grímsson, Grimm & Zetter 2017 [2018 in print]; Grímsson et al. 2017) and browse through their quite large online supplements [GGZ'18, G&al'17] which include supplementary tables, figures, extented methods, and primary data and analysis files. Maybe, the one or other peer even did, but his/her comments regarding the results of the earlier papers got lost or were rebutted by the authors? No reviewer contribution is mentioned in the acknowledgements of Liu et al. (which usually means, they were non-existent, negative or annoying). And because of peer review confidentiality, we will never know what they said, suggested or criticised.

MPEalso doesn't give the name of the editor handling a manuscript, something found in many Elsevier journals to shield the editors responsible for accepting a paper from nasty criticism. On the journal's homepage, the editors are listed, but no contact details (why, did you get too many complaints?)

But what really bugged me was that the authors criticised our work (especially my part of it) including misleading or simply wrong statements and without substantiating them or documenting their data matrix.

Which would allow me to respond and set things right. In this context, see also this nice Berkeley Q&A page for students: Misconceptions about science.

Whereas our first paper (GGZ'18) that provided such access already during the review phase, enjoyed a fierce, largely uninformed regarding the methodological-phylogenetic part but highly useful regarding systematic aspects, and long battle fought behind the curtain of peer review confidentiality with one anonymous reviewer (likely the above-mentioned forth author and nigh-expert) over two rounds. And all for nothing, when you read Liu et al. [PS: The much smoother but nevertheless productive review process of our second paper, G&al'17, published in PeerJ and the basis for Liu et al. can be viewed by everyone interested at the paper's page. Peer review transparency works, and gives committed reviewers and editors due credit.]

Grímsson et al. (2018) was submitted February, the 2nd, 2016, 73 days under review (it's a long paper, and if it would not be against rules in scientific publishing, one could have awarded our reviewers a co-authorship for their contribution), revision handed in a week later (having corrected everything what reviewer #1 and #2 told us; and having explained reviewer #1 the basic data situation and what our graphs show and don't show). Second review round equalled another 89 days "under review" because a third reviewer, an "expert of phylogeny" (anonymous, of course) was added since #1 still rejected the paper and #2 found it ready to publish. The third reviewer's report for which we had to wait three months was a data-situation ignorant piece filled with common-place statements that would take other experts in phylogenetics three minutes to write, as an according colleague ensured me; my responses to it were simple: in principle, yes, but please check and run our freely accessible for anonymous download data matrix yourself when you don't believe what we write and show. Our final revision and re-analysis of Su et al. (2015; File S6 to Grímsson et al. 2018 [PDF]) to explain the data and signal situation to reviewer #1 (and the phylogenetic expert #3), who kept expressing some very naive believes regarding Su et al.'s analysis and data, was handed in after 50 days (due to holidays and us being sick of responding again) and was finally accepted for publication a month later (October, 10th 2017) without a further comment. 

Is it irresponsible to share data?

Testing Liu et al. claims would have been easy for them, they just would have needed to re-do with their data what we (I) did before with the data I could access. But they didn't. And I have no issue with that. For professional scientists, it's publish or perish. One is well advised to only do the necessary for a paper, thus, minimising the per-paper-effort.

Each peer may request something else, and you cannot possibly anticipate all requests eventually put forward by experts in the one or other aspect of your study or the other peers. My own experience in the business was, the more you invest and show in a paper, the higher the risk it will fail or be cut down (an example) during the widely applied single-blind confidential peer review. In the best case, it just increases the time "under review". And one is generally ill-advised to mention any problems with one's data. And once a paper is published, its analyses and data are rarely revisited (post-publication peer review is a fragile plant): there is little gain in correcting errors (your own or those of others).

Having perished, I have time to re-analyse data of others and can freely circulate the results. Being possibly the only other person familiar with the signals in molecular data of Loranthaceae, I wrote an e-mail to the two corresponding authors asking for the data matrix to find out what applies here: Data not available or will be made available on request?
I'd be very happy if you could provide me with the data matrix, you used to infer the tree shown in fig. 1, to evaluate some of your claims, especially regarding the (so far unfounded) statement that there can no ingroup-outgroup branching artefacts, because you excluded the extremely long-branched sistergroup of Su et al. (2015, no plastid data available) and keeping only the distinctly long-branched further sister clades. ... It says on the journal homepage under Research Data: "Data not available / Data will be made available on request". I hope the latter applies, if not, please let me know otherwise. ...

Two days later, I got a nice answer by one of the corresponding authors, Lu Limin:
Thanks for your message and comments through ResearchGate. I have forwarded your message to our coauthors. Hopefully they will response soon. I will also send you the data matrix once the other corresponding author replies :)

Another two days later, I got a mail by author #4, Daniel L. Nickrent, retired botanist, acclaimed expert on Santalales, the order including the Loranthaceae (and hosting this nice, informative webpage revolving around parasitic plants), likely our anonymous reviewer #1 who ignored everything I wrote in my response to his analysis-related comments, telling me that he advised his coauthors against fulfilling my request.
... Your tone is not collegial but adversarial, and for that reason, there is no motivation for any of our team to work with you or respond favorably to your requests. ...
Well, I didn't express any interest to work with them (my e-mails comes along with a link to my homepage, clearly stating that I'm out-of-business), I just asked for the matrix to verify or falsify some claims in the paper, and, in the course of that, show why many branches in their tree still have no (near)unambiguous support despite data blanks being filled. Revisit my points. Good old, rarely done exploratory data analysis, possibly worth a Genealogical World of Phylogenetic Networks post as add-on to the last dealing with Loranthaceae data: Using consensus networks to understand poor roots. Showing why phylograms and consensus networks are obligatory when one faces ambiguous support patterns.

He also informed me that they have "more and better" data "in hand":
... Over the last few weeks I have assembled a 5-gene dataset for all Santalales - 146 of the 163 genera in the order ...  All nodes in the tree are resolved, except some along the spine [surprise, surprise] of Loranthaceae. And during this process, I discovered uncorrected sequences from my lab that were submitted to Genbank years ago [which will] be corrected shortly.
Therefore, there is no reason for you to be focusing so heavily upon sequences obtained from Genbank [if it would be obligatory to publish phylogenetic data, I would not have needed to harvest the gene banks at all]we have more and better quality [?!?] sequences in hand now and many of these will be submitted within the year when the manuscripts are submitted for publication. ...

I'm not entirely sure why Dan felt obliged writing me this. But it has a certain logic to upload only mediocre data to publicly accessible gene banks (where everyone can go and plunder the fruits of your hard work) and use it for some emergency quick-shot paper, while keeping more and better data for the next publication already in the pipe (where one can claim a better place in the author list?) and maybe in a better journal than MPE. By the way, Liu et al. was "Received 14 November 2017; Received in revised form 3 March 2018; Accepted 7 March 2018" (see also One date that is missing in many scientific publications). Publish or perish, one cannot wait another half-a-year to close all gaps and using the "more and better quality sequences", which according to Dan (and the authors) will resolve the backbone.
... As I have seen recently, many phylogeny issues are resolved once one has accurate and complete sequences....
No, it won't. In cases like this, one can get only get higher resolution (unambiguous branch support) by reducing the taxon set and increasing the gene sample. Cut down the bush to a data set providing trivial-as-possible signal.
And if it does, would one not be well-advised to use that better data for reconstructing the "Historical biogeography of Loranthaceae (Santalales)"? Accuracy of the dating and the biogeographic inferences relies on well-defined branch-lengths. We just pushed back the primary radiation at least 20 Ma and linked it to the Eocene warm-phase, which Liu et al. re-reconstruct and discuss at large as their result. So why the hurry? To avoid that too many people cite and see our dating study and realise one can study Loranthaceae without Dan Nickrent in the helm?!

Dan's full mail (he also pointed out that old sequences may be of lower quality, hence, may inflict topological problems during tree inference and that I could pay for filling data gaps) and my answer to it, quite a read (Being a natural-born constructivist, I naturally included a series of tips and links, and used the opportunity to point out again where I see problems but also how to deal with them) can be found here. It's clear that no-one outside the His Majesty's Court will get the actual research data used in publications. And Limin's hope that there will be a response from his co-authors to the actual issues I raised in my ResearchGate comment to their paper has (so far) not been fulfilled, too.

Elsevier's responsible sharing of scientific data in the 21st century

The freedom of science. A retired expert who is neither the first or second author, the two main authors that contributed equally to the paper according to the title page, nor one of the two corresponding ("senior") authors of a seven-author paper, decides who deserves access to the used data.

This may explain, why there has been so little progress (phylogenetically speaking) in an old (pre-Eocene initial radiation), widespread and interesting plant group like the Loranthaceae. It would not be the first fief in systematic botany and similarly soft natural sciences being controlled by an old man thriving on restricted access to material and data (it's usually a male at the top, even when it comes to flowers...). Someone, who should be a benevolent teacher (when you are retired, your career is over) using his knowledge to facilitate and propagate research, and not contain and control it.
Instead, the Lord of the Realm comfortably robes himself in the Impermeable Fog, making research endorsed by His Majesty unassailable (Su et al. 2015, with an online supplement challenging results reported in the main text and a tree riddled by obvious signal issues and data holes; Liu et al. 2018), while being able to pull the strings as hard as possible against everyone daring to enter his Very Own Realm.

But can you blame him? No, the problem lies solely with journals like MPE and publishers like Elsevier. Above all scientific petty kingdoms lies the Great Cloud: Elsevier's lax policies when it comes to data access and documentation, but strict policies when it comes to peer review intransparence (sorry, confidentiality). Share, but share responsibly.

Sancho, prepare the mount. Time to ride against the windmills.

Source: Wikimedia Commons, by רנדום

Any (science) post should have a positive ending

Hence, a tip for those searching for challenging, freely accessible and ready-to-use data matrices to play around with. Once I had my homepage running, I started to upload all my data for anonymous download (mainly because most journals publishing our papers didn't provide an according service).

Downloadable data
Find below a (alphabetically ordered) list of archives including primary data, analyses and results files used in our evolutionary and other studies. For details refer to the original literature and text files included in each archive. Anyone is invited to use these data for whatever purpose, even if you want to show that we were wrong, but in any case, you use it at your own risk: Archives have been tested and should be free of viruses, trojans, worms, cladistics, etc., but in today's dangerous world, you never know.

I always felt that providing a link to data matrix and one's analysis results already during review phase could help to avoid the one or other misunderstanding between authors and reviewers as e.g. in the case of my first try to publish an Acer phylogeny in MPE.

And I always sent anyone my matrices when being contacted or shared them when people were interested in them (published or not). It got me quite a bit extra citation for my papers, and even a few new collaborations (after the first shock, that somebody sends data and a ready-to-run matrix without asking for anything, not even a co-authorship).

Not only ideas and thoughts, but also basic scientific data should be free. 

And when it comes to (molecular) phylogenies, these basic data are small and simple files, ideally NEXUS-formatted (with the gene regions annotated).

Related posts and links

Cited papers
Chen L-Y, Grimm GW, Wang Q-F, Renner SS. 2015. A new phylogeny for the aquatic family Aponogetonaceae, combined with northern hemisphere fossils, rejects the hypothesized Australian origin of the family. Molecular Phylogenetics and Evolution 82:111–117.
Denk T, Grimm G, Stögerer K, Langer M, Hemleben V. 2002. The evolutionary history of Fagus in western Eurasia: Evidence from genes, morphology and the fossil record. Plant Systematics and Evolution 232:213-236.
Draper I, Hedenäs L, Grimm GW. 2007. Molecular and morphological incongruence in European species of Isothecium (Bryophyta). Molecular Phylogenetics and Evolution 42:700-716.
Göker M, Grimm GW. 2008. General functions to transform associate data to host data, and their use in phylogenetic inference from sequences with intra-individual variability. BMC Evolutionary Biology 8:86

Grimm GW. 2003. Tracing the mode and speed of intrageneric evolution - a case study of genus Acer L. and Fagus L. D.Sc. thesis. Eberhard-Karls University.
Grimm GW, Renner SS, Stamatakis A, Hemleben V. 2006. A nuclear ribosomal DNA phylogeny of Acer inferred with maximum likelihood, splits graphs, and motif analyses of 606 sequences. Evolutionary Bioinformatics 2:279–294

Grímsson F, Grimm GW, Zetter R. 2018 [2017 online]. Evolution of pollen morphology in Loranthaceae. Grana 57:16–116.
Grímsson F, Kapli P, Hofmann C-C, Zetter R, Grimm GW. 2017. Eocene Loranthaceae pollen pushes back divergence ages for major splits in the family. PeerJ 5:e3373 [e-pub].

Vidal-Russell R, Nickrent DL. 2007. The biogeographic history of Loranthaceae. Darwiniana 45:52–54.  

No comments:

Post a Comment

Enter your comment ...