I had a recent exchange with a news reporter regarding one of those silly psychology studies. I took a look at the article in question—this time it wasn’t published in Psychological Science or PPNAS so it didn’t get saturation publicity—and indeed it was bad, laughably bad. They didn’t just have the garden of forking paths, they very clearly did a series of analyses, then they finally reached something statistically significant and then they stopped and made some graphs and presented their conclusions.
OK, fine. There’s a lot of incompetent research out there. It’s easier to do bad research than to do good research, so if the bad research keeps getting published and publicized, we can expect to see more of it.
But what about these specific errors, which we keep seeing over and over again. I can’t imagine these researchers are making these mistakes on purpose!
The only reasonable inference to conclude here is that applied statistics is hard. Doing a statistical analysis is like playing basketball, or knitting a sweater. You can get better with practice.
How should we think about all this? To start with, I think we have to accept statistical incompetence not as an aberration but as the norm. The norm among researchers, thus the norm among journal referees, thus the norm among published papers.
Incompetent statistics does not necessarily doom a research paper: some findings are solid enough that they show up even when there are mistakes in the data collection and data analyses. But we’ve also seen many examples where incompetent statistics led to conclusions that made no sense but still received publication and publicity.
How should we react to this perspective?
Statisticians such as myself should recognize that the point of criticizing a study is, in general, to shed light on statistical errors, maybe with the hope of reforming future statistical education.
Journalists who are writing about quantitative research should not hold the default belief that a published analysis is correct.
Researchers and policymakers should not just trust what they read in published journals.
Finally, how can I be so sure that statistical incompetence is the norm, not an aberration? The answer is that I can’t be so sure. The way to study this would be to take a random sample of published papers, or perhaps a random sample of publicized papers, and take a hard look at their statistics. I think some people have done this. But from my own perspective, seeing some of these glaring errors that survived the journal reviewing process, and seeing them over and over, gives me the sense that we’re seeing the statistical equivalent of a bunch of saggy sweaters knitted by novices.
Also, this: statistical errors come from bad data analysis, but also from bad data collection and data processing (as in that notorious paper that defined days 6-14 as peak fertility). One message I keep sending is that we should all be thinking about data quality, not just data analysis.
OK, and now here’s the story that motivated these thoughts.
I received this email from Alex Kasprak:
Hello Dr Gelman,
I am a science writer at BuzzFeed . . . writing a brief round up of all the ‘scientific’ claims made about beards in 2015. I was wondering if you (or someone you know) would be able to comment on how rigorous and/or problematic the statistical methods are in three such studies:
The Association Between Men’s Sexist Attitudes and Facial Hair (Oldmeadow-2015.pdf)
– A sample of 500 men from America and India (nowhere else) shows a significant relationship between sexist views and the presence of facial hair.
A lover or a fighter? Opposing sexual selection pressures on men’s vocal pitch and facial hair (Saxton-2015.pdf)
– 20 men and 20 women rated the attractiveness and perceived masculinity of 6 men at different stages of facial hair development.
The Role of Facial and Body Hair Distribution in Women’s Judgments of Men’s Sexual Attractiveness (Dixson-2015.pdf)
– ~3000 women ranked the attractiveness of 20 men of varying degrees of body and facial hair coverage (in paired choices) to determine fuller beards and less body hair was preferential.
Please let me know if you might have time to speak with me about this. . . .
I responded as follows:
Hi Alex. I do not recommend taking this stuff seriously at all. I don’t have the energy to read all of this, but I took a quick look at the first paper and found this:
“Since a linear relationship has been found between facial hair thickness and perceived masculinity . . . we explored the relationship between facial hair thickness and sexism. . . . Pearson’s correlation found no significant relationships between facial hair thickness and hostile or benevolent sexism, education, age, sexual orientation, or relationship status.”
And then this: “We conducted pairwise comparisons between clean-shaven men and each facial hair style on hostile and benevolent sexism scores. . . . For the purpose of further analyses, participants were classified as either clean-shaven or having facial hair based on their self- reported facial hair style . . . There was a significant Facial Hair Status by Sexism Type interaction . . .”
So their headline finding appeared only because, after their first analysis failed, they shook and shook the data until they found something statistically significant. All credit to the researchers for admitting that they did this, but poor practice of them to present their result in the abstract to their paper without making this clear, and too bad that the journal got suckered into publishing this. Then again, every paper can get published somewhere, and I’ve seen work just as bad that’s been published in prestigious journals such as Psychological Science or the Proceedings of the National Academy of Sciences.
As long as respectable journals will publish this work, and as long as news outlets will promote it, there will be a motivation for researchers to continue going through their data to find statistically significant comparisons.
I should blog this—is it ok if I quote you?
Would you mind holding off on blogging until we’ve published our piece, by any chance?
And I responded:
Yes, sure, no problem at all. Right now our blog is backlogged until March 2016!
And, indeed, here we are.