Someone pointed me to this article by Isabel Scott and Nicholas Pound:
Recent authors have reported a relationship between women’s fertility status, as indexed by menstrual cycle phase, and conservatism in moral, social and political values. We conducted a survey to test for the existence of a relationship between menstrual cycle day and conservatism.
2213 women reporting regular menstrual cycles provided data about their political views. Of these women, 2208 provided information about their cycle date . . . We also recorded relationship status, which has been reported to interact with menstrual cycle phase in determining political preferences.
We found no evidence of a relationship between estimated cyclical fertility changes and conservatism, and no evidence of an interaction between relationship status and cyclical fertility in determining political attitudes. . . .
I have no problem with the authors’ substantive findings. And they get an extra bonus for not labeling day 6 as high conception risk:
Seeing this clearly-sourced graph makes me annoyed one more time at those psychology researchers who refused to acknowledge that, in a paper all about peak fertility, they’d used the wrong dates for peak fertility. So, good on Scott and Pound for getting this one right.
There’s one thing that does bother me about their paper, though, and that’s how they characterize the relation of their study to earlier work such as the notorious paper by Durante et al.
Scott and Pound write:
Our results are therefore difficult to reconcile with those of Durante et al, particularly since we attempted the analyses using a range of approaches and exclusion criteria, including tests similar to those used by Durante et al, and our results were similar under all of them.
Huh? Why “difficult to reconcile”? The reconciliation seems obvious to me: There’s no evidence of anything going on here. Durante et al. had a small noisy dataset and went all garden-of-forking-paths on it. And they found a statistically significant comparison in one of their interactions. No news here.
Scott and Pound continue:
Lack of statistical power does not seem a likely explanation for the discrepancy between our results and those reported in Durante et al, since even after the most restrictive exclusion criteria were applied, we retained a sample large enough to detect a moderate effect . . .
Again, I feel like I’m missing something. “Lack of statistical power” is exactly what was going on with Durante et al., indeed their example was the “Jerry West” of our “power = .06″ graph:
Scott and Pound continue:
One factor that may partially explain the discrepancy is our different approaches to measuring conservatism and how the relevant questions were framed. . . . However, these methodological differences seem unlikely to fully explain the discrepancy between our results . . . One further possibility is that differences in responses to our survey and the other surveys discussed here are attributable to variation in the samples surveyed. . . .
Sure, but aren’t you ignoring the elephant in the room? Why is there any discrepancy to explain? Why not at least raise the possibility that those earlier publications were just examples of the much-documented human ability to read patterns in noise.
I suspect that Scott and Pound have considered this explanation but felt it would be politic not to explicitly suggest it in their paper.
P.S. The above graph is a rare example of a double-y-axis plot that isn’t so bad. But the left axis should have a lower bound at 0: it’s not possible for conception risk to be negative!