Lionel Hertzog writes:

In the method section of a recent Nature article in my field of research (diversity-ecosystem function) one can read the following:

The inclusion of many predictors in statistical models increases the chance of type I error (false positives). To account for this we used a Bernoulli process to detect false discovery rates, where the probability (P) of finding a given number of significant predictors (K) just by chance is a proportion of the total number of predictors tested (N = 16 in our case: the abundance and richness of 7 and 9 trophic groups, respectively) and the P value considered significant (α = 0.05 in our case). The probability of finding three significant predictors on average, as we did, is therefore, P = [16!/(16 − 3)!3!] × 0.053(1 − 0.05)(16 − 3) = 0.0359, indicating that the effects we found are very unlikely to be spurious. The probability of false discovery rates when considering all models and predictors fit (14 ecosystem services × 16 richness and abundance metrics) and the ones that were significant amongst them (52: 25 significant abundance predictors and 27 significant richness predictors) was even lower (P < 0.0001).

I am no statistician but all my stats-sense are turning red when reading this. What do you think?

My reply: It might be that all the science is solid here; I have not read the paper nor do I have any particular expertise in this area. But I too am disturbed by the passage above, for a couple of reasons, even beyond the usual error of reversing the probability (“the effects we found are very unlikely to be spurious”). My big problems with the the above sort of analysis are:

(a) The whole null-hypothesis rejection thing. I don’t think anyone should care about the null hypothesis. In real life, there’s always variation.

(b) The idea of picking out “three significant predictors” and the whole “false discovery” thing. Again, everything’s happening and there is variation. The whole true-false thing doesn’t make sense to me in this sort of situation.

P.S. The article is called “Biodiversity at multiple trophic levels is needed for ecosystem multifunctionality,” and its authors are

Santiago Soliveres, Fons van der Plas, Peter Manning, Daniel Prati, Martin M. Gossner, Swen C. Renner, Fabian Alt, Hartmut Arndt, Vanessa Baumgartner, Julia Binkenstein, Klaus Birkhofer, Stefan Blaser, Nico Blüthgen, Steffen Boch, Stefan Böhm, Carmen Börschig, Francois Buscot, Tim Diekötter, Johannes Heinze, Norbert Hölzel, Kirsten Jung, Valentin H. Klaus, Till Kleinebecker, Sandra Klemmer, Jochen Krauss, Markus Lange, E. Kathryn Morris, Jörg Müller, Yvonne Oelmann, Jörg Overmann, Esther Pašalić, Matthias C. Rillig, H. Martin Schaefer, Michael Schloter, Barbara Schmitt, Ingo Schöning, Marion Schrumpf, Johannes Sikorski, Stephanie A. Socher, Emily F. Solly, Ilja Sonnemann, Elisabeth Sorkau, Juliane Steckel, Ingolf Steffan-Dewenter, Barbara Stempfhuber, Marco Tschapka, Manfred Türke, Paul C. Venter, Christiane N. Weiner, Wolfgang W. Weisser, Michael Werner, Catrin Westphal, Wolfgang Wilcke, Volkmar Wolters, Tesfaye Wubet, Susanne Wurst, Markus Fischer and Eric Allan.

Not wild about the approach either (even if they’re going to do something like this, they should use Prob(detections>=3), which is 0.042, but that’s just fussing with details). Setting that aside, I’m suspicious of made-up approaches to false discovery rates when there are perfectly good FDR correction methodologies out there (e.g. see ?p.adjust in R). The article doesn’t mention anything about data availability: Nature’s requirement to put a data availability statement at the end of the Methods section wasn’t announced until shortly *after* this paper was published … Interesting observational data set, it would be interesting to have it available for re-analysis …

Shouldn’t the authors be interested in the probability of finding three _or more_ significant predictors?

Also, the authors appear to assume that the count of the p-values corresponding to the regression coefficients being less than alpha is a Binomial random variable. I am surmising this from the formula for P in the block quote, which I think includes a typo and the latter part should be 0.05**3 x (1-0.05)**(16-3).

Is this a reasonable assumption for the number of statistically significant regression coefficients in their context? If there is modest co-linearity, then I doubt it.

I haven’t read anything about this other than what Andrew quoted, so I don’t know what “discovery” means in this context. But it’s possible that it really is a yes/no thing. “Is the species present in this area right now” would be either true or false, for example. So I’m not 100% sure that Andrew is right that the null hypothesis doesn’t make sense. Depends on what question they are asking.

Even in this context, it probably is better to focus on “what is the kernel-weighted population density in the vicinity of point X” which turns questions like “is there at least one beaver in this arbitrary box I drew on a map” which is binary, into a continuous question. The continuous question is usually more meaningful. For example suppose you use the arbitrary political “county line” for counting beavers. If one afternoon a beaver goes a little farther to get a tree, and then crosses a county line by 100 feet, the beaver count in a neighboring county maybe goes to zero, and the beaver count in the county of interest goes to 1… but the beaver moved a total of 100 feet.

Andrew,

Here and in the past you seem to subtly poke fun at the vast number of authors on many of these NHST-driven papers. Care to spell out what you perceive to be the connection between author count and methodology/rigor/approach? I also feel like there may be something there but am not really sure what it is. Or are you just pointing to the sort of humorous disconsonance of an army of scientists conducting a simplistic analysis to cap off all the work that went into data collection?

Z:

No, I’m not poking fun. I generally think it’s appropriate to list the authors of a paper being discussed. Usually I give the authors’ names right up front, but in this case the list was long so I put it at the end. I too have written papers with many authors (for example, here), and I think that’s fine. If the convention in a particular field is to give authorship to many people involved in different aspects of a study, I have no problem with that. Indeed, I’m more bothered by some papers in econ where only a single author is listed, even thought it’s clear that many people were involved in design, data collection, analysis, and writeup.

Oh, come on. Would the paper have been published in PlosOne only ~six authors would have been listed. But you know what? A publication in Nature is a career booster, so everyone who wasted a thought on this project is given an authorship. Does this improve the quality of the paper? No, they even made up their own approach for correcting the false discovery rate, although dozens of approaches already exist. And how easy this was, impressive. Hard to believe statisticians work for years or decades on these methods.

My intuitive concern about that statement is that it seems to implicitly/idealistically presume perfect independence among the predictors, which is rarely the case. Given some correlation among the predictors, you’ll want to take the results with a further degree of skepticism. That being said, I’ll grant the analyst credit for making an attempt at constraining what to take seriously — personally I might have gone with something like a Hommel or some standard variant of false discovery adjustment.

+1

Maybe I’m misunderstanding what they’re doing, but shouldn’t they be conducting a joint test (like an F-Test?) Even with their many possible models, they could do FDR-control over the joint test statistics.

Brad:

No. See point (b) in my post above.