Chris Chambers and I had an enlightening discussion the other day at the blog of Rolf Zwaan, regarding the Garden of Forking Paths (go here and scroll down through the comments).
Chris sent me the following note:
I’m writing a book at the moment about reforming practices in psychological research (focusing on various bad practices such as p-hacking, HARKing, low statistical power, publication bias, lack of data sharing etc. – and posing solutions such as pre-registration, Bayesian hypothesis testing, mandatory data archiving etc.) and I am arriving at rather unsettling conclusion: that null hypothesis significance testing (NHST) simply isn’t valid for observational research. If this is true then most of the psychological literature is statistically flawed.
I was wonder what your thoughts were on this, both from a statistical point of view and from your experience working in an observational field.
We all know about the dangers of researcher degrees of freedom. We also know how it is easy it is to obtain significant p values in exploratory analyses that are meaningless and misleading. Dorothy Bishop has a great example on her blog of a four-way ANOVA conducted on a null data set in which a significant p value for at least one main effect or interaction will be found at least 50% of the time): http://deevybee.blogspot.co.uk/2013/07/why-we-need-pre-registration.html
Given the threat of researcher degrees of freedom, do you feel that NHST ever an appropriate approach to exploratory (unregistered) inferential statistical analysis? And, given these concerns, why should anyone believe the outcome of a NHST procedure that isn’t pre-registered?
I replied: my brief answer is that different methods, derived from different philosophies, can be mathematically equivalent to each other. So, null hypothesis significance testing is equivalent to a classical conf interval which is equivalent to Bayes with flat prior which can make sense if the effects size are large and the measurement error is strong.
If the best and only defence to researcher degrees of freedom is pre-registration, then how can scientists securely interpret p values in observational research? How can they even interpret them in their own research, given our own unconscious bias? That is, doesn’t interpreting a p value carry the concrete requirement that no researcher dfs have been exploited?
I would also be very interested to hear your critique of pre-registration (as it applies to your research) in more detail. What is it specifically about pre-registration that would have prevented your most important discoveries? All pre-registration does is enable readers to distinguish confirmatory analysis from exploratory analysis – it doesn’t block exploratory analysis or hinder it in any way (that I can see). That being so, and assuming your major discoveries stemmed from exploratory analysis, why would having those same exploratory analyses form part of a pre-registered study make any difference to their interpretation or impact? (or would it have changed your mindset in some way, e.g. by making you more conservative in your approach?). I find this discussion intriguing because I’ve never seen pre-registration an an enemy of exploration, only as an aid to distinguish hypothesis testing from hypothesis generation.
I don’t think the existence of preregistration would have killed my results, and I support proposals in psychology and political science to allow preregistration to be done in an open way. I just wouldn’t want preregistration to be required, indeed the concept of preregistration would seem to me to be just about impossible to apply in the analysis of public datasets such as we use in political science. And even in our analysis of non-public datasets, we learned most of what to look at after looking at the data.
And here’s Chris again:
I don’t think pre-registration should be mandatory either. Though I think it should be strongly encouraged in fields where undisclosed flexibility is identified as a major cause of false discoveries (which is certainly the case in psychology and cognitive neuroscience). As you say, it’s more challenging for areas that rely on analysis of existing datasets. In psychology I think the solution in that case is to consider all analyses of existing datasets as (by definition) exploratory and thus most valuable in terms of hypothesis generation and modeling.
Having said that, I don’t know if you were aware of this but the revised Declaration of Helsinki (to which major psychology and neuro journals adhere) now requires mandatory pre-registration. See clause 35 especially here: http://jama.jamanetwork.com/article.aspx?articleid=1760318#ResearchRegistrationandPublicationandDisseminationofResults
I took a look at the relevant section: “Every research study involving human subjects must be registered in a publicly accessible database before recruitment of the first subject.”
I’m not clear what it means for a study to be “registered.” I don’t know that this would require the analysis to be specified ahead of time.
Indeed, quite possibly not. But it raises the bar substantially and normalises the idea of saying something about what the researcher is going to do before doing it. From there it becomes not a question of “did you pre-register?” but of “what did you pre-register?”
I guess, as a start, that people will preregister the designs in their NIH proposals. I’m hoping they will be registering their data as well.
And here are my previously published thoughts on preregistration in political science.