I received two emails yesterday on related topics.
First, Stephen Olivier pointed me to this post by Daniel Lakens, who wrote the following open call to statisticians:
You would think that if you are passionate about statistics, then you want to help people to calculate them correctly in any way you can. . . . you’d think some statisticians would be interested in helping a poor mathematically challenged psychologist out by offering some practical advice.
I’m the right person to ask this question, since I actually have written a lot of material that helps psychologists (and others) with their data analysis. But there clearly are communication difficulties, in that my work and that of other statisticians hasn’t reached Lakens. Sometimes the contributions of statisticians are made indirectly. For example, I wrote Bayesian Data Analysis, and then Kruschke wrote Doing Bayesian Data Analysis. Our statistics book made it possible for Kruschke to write his excellent book for psychologists. This is a reasonable division of labor.
That said, I’d like to do even more. So I will make some specific suggestions for data analysis in psychology right here in this post, in the context of my next story:
Dan Kahan sent me this note:
The most egregious instance of totally bogus methods I had the misfortune to feel obliged to call foul on involved an econometrics study that purported to find that changes in law that never happened increased homicides by “lowering the cost” of committing them…)
Actually, as you know, often times investigation of a “wtf?!” report like this discloses that the problem is in the news report & not in the study.
I think you agree that many of the “bad statistics/methods” problems & even the “nonreplicability” problem are rooted in the perpetuation of a set of mindless statistical protocols associated with ossified conception of NHT (one from which all the thought that it might have reflected was drained away & discarded decades ago).
But certainly another problem is the “wtf?!!!!!!” conception of psychology. Its distinguishing feature is its supposed discovery of phenomena that are shocking bizarre & lack any coherent theory.
The alternative conception of psychology is the “everything is obvious — once you know the answer.” The main point of empirical research isn’t to shock people. It’s to adjudicate disputes between competing plausible conjectures about what causes what we see. More accounts of what is going are plausible than are true; without valid inference from observation, we will never separate the former from the sea of the latter & will drown in a sea of “just so” story telling.
I have zero confidence in “wtf?!!!” & am convinced that it is a steady stream of bogus, nonreplicable studies that hurt the reputation of psychology.
I have lots of confidence in EIO–OYKTA. It’s not nearly so sexy — which is good, b/c it removes the temptation to cut corners in all the familiar, petty ways that researchers do (usually by coaxing out a shy “p < 0.05” to emerge w/ one or another data-manipulative come-on line). But it is dealing with matters that reflect real, theorized, validated mechanisms of psychology (the issue in each case is — which one?!), and ones that are important enough for researchers to keep at it essentially forever, revising, correcting, improving our evolving understanding of what’s going on.
Kahan points to a much-mocked and criticized study by Kristina Durante, Ashley Arsena, Vladas Griskevicius, “The Fluctuating Female Vote: Politics, Religion, and the Ovulatory Cycle,” which was reported then retracted from CNN under the title, “Study looks at voting and hormones: Hormones may influence female voting choices.”
The relevance for the present discussion is that this paper was published in Psychological Science, a top journal in psychology. Here’s the abstract:
Each month many women experience an ovulatory cycle that regulates fertility. Whereas research finds that this cycle influences women’s mating preferences, we propose that it might also change women’s political and religious views. Building on theory suggesting that political and religious orientation are linked to reproductive goals, we tested how fertility influenced women’s politics, religiosity, and voting in the 2012 U.S. presidential election. In two studies with large and diverse samples, ovulation had drastically different effects on single versus married women. Ovulation led single women to become more liberal, less religious, and more likely to vote for Barack Obama. In contrast, ovulation led married women to become more conservative, more religious, and more likely to vote for Mitt Romney. In addition, ovulatory-induced changes in political orientation mediated women’s voting behavior. Overall, the ovulatory cycle not only influences women’s politics, but appears to do so differently for single versus married women.
I took a look at the paper, and what I found was a bunch of comparisons and p-values, some of which were statistically significant, and then lots of stories. The problem is that there are so many different things that could be compared, and all we see is some subset of the comparisons. Many of the reported effects seem much too large to be plausible. And there’s a casual use of causal language (for example, the words “influenced,” “effects,” and “induced” in the above abstract) to describe correlations.
Beyond all that, I found the claimed effects implausibly large. For example, they report that, among women in relationships, 40% in the ovulation period supported Romney, compared to 23% in the non-fertile part of their cycle. Given that surveys find vary few people switching their vote preferences during the campaign for any reason, I just don’t buy it. The authors might respond that they don’t care about the magnitude of the difference, just the sign, but (a) with a magnitude of this size, we’re talking noise noise noise, and (b) one could just as easily explain this as a differential nonresponse pattern: maybe liberal or conservative women in different parts of their cycle are more or less likely to participate in a survey. It would easy enough to come up with a story about that!
Anyway, my point is not to slam the work of Durante et al. They did a little study, wrote it up, and submitted it to one of the leading journals in their field. It’s not their fault the journal chose to publish it.
Also, let me emphasize that I’m not saying that their claims (regarding the effects of ovulation) are false. I’m just saying that the evidence from their paper isn’t as strong as they make it out to be.
A statistician offers helpful advice for psychology researchers
My real goal here is to address the question that was brought up at the beginning of this post: What recommendations can I, as a statistician, give to psychology researchers? Here are a few, presented in the context of the paper on ovulation and political attitudes:
1. Analyze all your data. For most of their analyses, the authors threw out all the data from participants who were PMS-ing or having their period. (“We also did not include women at the beginning of the ovulatory cycle (cycle days 1–6) or at the very end of the ovulatory cycle (cycle days 26–28) to avoid potential confounds due to premenstrual or menstrual symptoms.”) That’s a mistake. Instead of throwing out one-third of their data, they should’ve just included that other category in their analysis.
2. Present all your comparisons. The paper leads us through a hopscotch of comparisons and p-values. Better just to present everything. I have no idea if the researchers combed through everything and selected the best results, or if they simply made a bunch of somewhat arbitrary decisions throughout of what to look for.
For example, I would’ve liked to see a comparison of respondents in different parts of their cycle on variables such as birth year, party identification, marital status, etc etc. Just a whole damn table (even better would be a graph but, hey, I won’t get greedy here) showing these differences for every possible variable.
Instead, what do we get? Several pages full of averages, percentages, F tests, chi-squared tests, and p-values, all presented in paragraph form. Better to have all possible comparisons in one convenient table.
3. Make your data public. If the topic is worth studying, you should want others to be able to make rapid progress. If there’s some confidentiality restrictions, remove the respondents’ identifying information. Then post the data online.
4. And now some advice for journal editors. What’s the purpose of a top journal in a field such as psychology? I think they should be publishing the top work. This paper is not top work, by any standard. The researchers asked a few survey questions to a bunch of people on Mechanical Turk, then did who knows how many comparisons and significance tests, reported some subset of the results, then went on to story time. It’s not innovative data collection, it’s not great theory, it’s not great data analysis, it’s not a definitive data source, it’s nothing. What it is, is headline-bait that’s not obviously wrong. But is that the appropriate standard? It’s not obviously wrong to three referees, so publish it? Psychological Science is “the flagship journal of the Association for Psychological Science . . . the highest ranked empirical journal in psychology.”
As a statistician, my advice is: if a paper is nothing special, you don’t have to publish it in your flagship journal. In this as with the notorious Daryl Bem article, I feel that the journal almost seemed to feel an obligation to publish a dubious claim, just because some referees didn’t happen to find any flaws in the data collection or analysis. But if Psychological Science does not publish this article, it’s not censorship or suppression; the authors can feel free to submit it to a lesser journal. For the leading journal to have such low standards, this is bad news for the entire field. For one thing, it encourages future researchers to focus on this sort of sloppy work.
I’m hoping the above advice will make Stephen Olivier happy. I’m not just sitting there criticizing something, or telling someone to use R instead of SPSS, or lecturing psychologists about their lack of mathematical skills. I’m giving some very specific suggestions that you, the psychology researcher, can use in your next research project (or, if you’re a journal editor, in your next publication decision).
There is, of course, lots and lots of additional advice that I and other statisticians could give. The above is just a start. But I wanted to start somewhere, just to demonstrate to Olivier (and others) that this is indeed possible.
P.S. Blogger Echidne raised similar points last year.