[I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis.

The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both?

It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at least one (of many) analyses producing a falsely positive finding at the 5% level is necessarily greater than 5%.

Another excellent link via Yalda Afshar. Other choice quotes, “Everything reported here actually happened”, “Author order is alphabetical, controlling for father’s age (reverse-coded)”.

I [Malecki] would rank author guidelines №s 5 & 6 higher in the order.

Malecki:

See here for an earlier discussion of this paper on our blog (along with a bunch of thoughtful comments).

ok so what are your strategies when asked to analyze data after the experiment has already been planned and data collected? I’m confronting this situation now… trying to stick to one analysis but experiencing some pressure from the biologists to explore different options (obviously the stated goal is not to cheat on significance but to find things that look “interesting”)

The simplest – and perhaps best – is probably to report as a post hoc analysis and then see if you can replicate the result. Some analyses may cope with some forms of researcher degrees of freedom (but I’m not yet convinced that any method deals with all researcher degrees of freedom – particular those that change the data by recoding etc.).

If you have enough data, you can do exploratory analyses on part of the data, and then test the hypotheses you generate using the rest of the data. And even repeat many times with different random divisions of the data, to see how often this procedure leads to the same answer at the end.

and the “Andrew’s probably already said something about this” is true yet again :-/ sorry! ha

Simmons et al. is a great paper, well worth revisiting. ;-) I have some discussion here: http://dynamicecology.wordpress.com/2012/02/16/must-read-paper-how-to-make-any-statistical-test-come-out-significant/

Worth noting Simmons et al.’s emphasis that the issue is not with p-values per se; being Bayesian is no panacea.

Seems like the registry for research designs recently proposed for political science experiments and discussed over at The Monkey Cage is one way to address some of the issues raised by Simmons et al. If I understand correctly, such a registry would be a way of forcing people to do what Simmons et al. recommend: report all judgment calls and exploratory analyses (basically, anything you did that wasn’t in the registered research design).

http://e-gap.org/wp/wp-content/uploads/20121025-EGAP-Proposal.pdf

Degrees of freedom + perverse incentives = optimization.

Psychological Science (the journal that published the false-positive psychology paper) had a discussion about the authors’ recommendations shortly after it came out. See here: http://hardsci.wordpress.com/2012/01/02/an-editorial-board-discusses-fmri-analysis-and-false-positive-psychology/

And they are now seriously discussing a set of initiatives that would include mandatory researcher disclosure, based in part on that critique: http://hardsci.wordpress.com/2012/10/30/psychological-science-to-publish-direct-replications-maybe/

In case anyone is interested, we wrote a very brief follow-up recently.

We try to suggest an even easier solution than what we reported in this paper.

Also, scientific reporting is compared to food labels.

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588

That’s a great follow-up; thanks for sharing!

“5. If observations are eliminated, authors must also report what the statistical results are if those observations are included.”

Would this also need to be done in instances were multiple imputation is used? It makes sense to but it may discourage imputation (which we should be encouraging).

[…] statistician Andrew Gelman: 1) health disparities are associated with low life expectancy; 2) researcher “degrees of freedom” (ie, how many ways can you fiddle with the data to get the result you want; and 3) social […]

Good stuff, but isn’t it essentially the point made by Edward Leamer in 1978 (Specification Searches) and a subsequent paper in American Economic Review?

http://www.anderson.ucla.edu/faculty/edward.leamer/books/specification_searches/specification_searches.htm