Selection bias in the reporting of shaky research

I’ll reorder this week’s posts a bit in order to continue on a topic that came up yesterday.

A couple days ago a reporter wrote to me asking what I thought of this paper on Money, Status, and the Ovulatory Cycle. I responded:

Given the quality of the earlier paper by these researchers, I’m not inclined to believe anything these people write. But, to be specific, I can point out some things:

– The authors define low fertility as days 8-14. Oddly enough, these authors in their earlier paper used days 7-14. But according to womenshealth.gov, the most fertile days are between days 10 and 17. The choice of these days affects their analysis, and it is not a good sign that they use different days in different papers. (see more on this point in sections 2.3 and 3.1 of this paper: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf)

– They perform a lot of different analyses, and many others could be performed. For example, “Study 1 indicates that ovulation boosts women’s tendency to seek relative gains when given the opportunity to possess products superior to those of other women. However, we expect that ovulation should not have the same effect on women’s choices if they have the opportunity to possess better products than men who are potential mates.” But if they found the pattern in both groups they could argue that this is consistent with their theory in a different way. They’re essentially playing a lottery where they make the rules, and they can keep coming up with ways in which they win.

– For another example, “Ethnicity, relationship status, and income had no effect on the dependent measures.” But if they had found something, this could’ve fit the story. Recall that their previous paper was all about relationship status!

– Yet another example: “A repeated measures logistic regression . . . revealed a significant interaction . . . however, when women compared their house relative to that of men, there was no difference . . .” Again, any pattern here would fit their story.

– Here’s another example: “As in Study 1, we next examined women’s choices relative to women across the full 28-day cycle.” But in their earlier paper, they did not do this: rather, they only compared days 7-14 to days 17-25, completely excluding days 1-6, 15-16, and 26-28.

– The authors say their results fit their model, for example, “Consistent with H3, we predicted that ovulation would lead women to give smaller financial offers to other women but not to men.” Where exactly is this “prediction”? Did the authors really predict this ahead of time in a public way, or are they just saying they predicted it? In either case, how many other things did they predict? It wouldn’t be so impressive if they predicted (or could predict) thousands of possible comparisons and then they show what worked.

Perhaps the biggest problem with this study is that it purports to be all about effects within women, but it is a between-subjects design. That is, they do _not_ interview women at multiple points during their cycles. What they do is compare different women. That makes this sort of study close to hopeless. There’s just too much variation from person to person. What they’re doing is finding patterns in noise. If you look hard enough—and they do—you’ll find statistically significant patterns. Along with that they have a flexible theory that can explain just about anything.

To conclude, I’m not saying I think the authors “cheated” or “fished for statistical significance.” I have no idea how they did their analysis, but they very well could’ve made various data coding and data analysis decisions after seeing the data, and in addition that have a lot of choices in how to interpret their results in light of their theories. I think it’s hopeless. They might as well be reading tea leaves.

The reporter thanked me and wrote that he was still trying to figure out whether to write about the paper.

This made me think a bit: it’s not an easy question. I’m negative on the paper but others might be positive. Beyond that there’s a selection issue. Suppose you happen to be convinced that the article is worthless, and so you decide not to run the story. But somewhere else there is a reporter who swallows the press release, hook, line, and sinker. This other reporter would of course run a big story. Hence the selection bias that the stories that do get published are likely to repeat the hype. Which in turn gives a motivation for researchers and public relations people to do the hype in the first place. On the other hand, if you do run a skeptical story, then you’re continuing to give press for this silly study, giving it more attention that it doesn’t deserve (in my opinion).

This is a reporter’s dilemma that echoes the discussion I was having with Jeff Leek about when to shoot down what we believe to be bad work and when to ignore it. If lots of other people are paying attention to the paper, then I think a journalist is doing a service by shooting it down. But if the paper is basically being ignored (or perhaps just being treated as a “politically incorrect” oddity), then there’s no point in pulling it up from obscurity just to go on about what’s wrong with it.

The problem, as I see it, is when a claim presented with (essentially) no evidence is taken as truth and then treated as a stylized fact. And the norms of scientific publication, as well as the norms of science journalism, push toward this. If you act too uncertain in your scientific report, I think it becomes harder to get it published in a top journal (after all, they want to present “discoveries,” not “speculations”). And science journalism often seems to follow the researcher-as-Galileo mold.

10 thoughts on “Selection bias in the reporting of shaky research

  1. The latest is “Barbie vs. Mrs. Potato Head,” with the claim that playing with Barbie a very short time “crushes” little girls’ occupational dreams. Just gogle it. The story is all over the place.
    The effect size is, it seems to me, very small, and the measure does not in any way relate to whatever occupational “dreams” these young girls may have. Playing with Barbie may only make the girls a little more distracted and “lazy” about checking off items on a list.
    When will this nonsense stop?

  2. Pingback: “When to shoot down… bad work and when to ignore it.” | Man the Measure

  3. This is a consequence of the Gladwellisation of everything. Scientists are in effect using popular authors of airport books to promote their single, underpowered, p-hacked, unreplicable study. Then some junior education minister buys a copy on the way to a meeting and before you know it, that study is driving schools policy.

    Hopefully Kahneman’s promised train wreck will arrive sooner rather than later, but I suspect that it will be in the form of some massively ill-advised government policy and we won’t actually discover it was a train wreck until we find the smouldering ashes of the money (and whatever else got trashed).

    Perhaps there will come a day where the popular media realises that these stories are not really “Man bites dog” but the altogether less exciting “Man shows picture of injured dog, claims to have inflicted bite”. This may take a while, though, as there are so many dogs out there, and who knows, maybe this 9999th piece of fluff is the real thing. And of course, all this presumes that the journalists actually care, which is doubtless true for some percentage of them.

    One possible solution is to find ways to get the media to write about the problems. In some ways these can be more interesting than the originals because there’s the angle of “scientists spent how much government money on this?”. The recent Economist piece http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble and even, if I may be so bold, the Observer’s article on the Losada story (three full-sized pages in the print edition) are examples.

  4. Pingback: The paradox of newsworthiness - The Washington Post

  5. Pingback: Selection bias in the reporting of shaky research: An example - Statistical Modeling, Causal Inference, and Social Science

  6. Pingback: Can the science community help journalists avoid science hype? It won’t be easy. « Statistical Modeling, Causal Inference, and Social Science

  7. Pingback: (1) The misplaced burden of proof, and (2) selection bias: Two reasons for the persistence of hype in tech and science reporting « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.