Skip to content

“Menstrual Cycle Phase Does Not Predict Political Conservatism”


Someone pointed me to this article by Isabel Scott and Nicholas Pound:

Recent authors have reported a relationship between women’s fertility status, as indexed by menstrual cycle phase, and conservatism in moral, social and political values. We conducted a survey to test for the existence of a relationship between menstrual cycle day and conservatism.

2213 women reporting regular menstrual cycles provided data about their political views. Of these women, 2208 provided information about their cycle date . . . We also recorded relationship status, which has been reported to interact with menstrual cycle phase in determining political preferences.

We found no evidence of a relationship between estimated cyclical fertility changes and conservatism, and no evidence of an interaction between relationship status and cyclical fertility in determining political attitudes. . . .

I have no problem with the authors’ substantive findings. And they get an extra bonus for not labeling day 6 as high conception risk:


Seeing this clearly-sourced graph makes me annoyed one more time at those psychology researchers who refused to acknowledge that, in a paper all about peak fertility, they’d used the wrong dates for peak fertility. So, good on Scott and Pound for getting this one right.

There’s one thing that does bother me about their paper, though, and that’s how they characterize the relation of their study to earlier work such as the notorious paper by Durante et al.

Scott and Pound write:

Our results are therefore difficult to reconcile with those of Durante et al, particularly since we attempted the analyses using a range of approaches and exclusion criteria, including tests similar to those used by Durante et al, and our results were similar under all of them.

Huh? Why “difficult to reconcile”? The reconciliation seems obvious to me: There’s no evidence of anything going on here. Durante et al. had a small noisy dataset and went all garden-of-forking-paths on it. And they found a statistically significant comparison in one of their interactions. No news here.

Scott and Pound continue:

Lack of statistical power does not seem a likely explanation for the discrepancy between our results and those reported in Durante et al, since even after the most restrictive exclusion criteria were applied, we retained a sample large enough to detect a moderate effect . . .

Again, I feel like I’m missing something. “Lack of statistical power” is exactly what was going on with Durante et al., indeed their example was the “Jerry West” of our “power = .06″ graph:

Screen Shot 2014-11-17 at 11.19.42 AM

Scott and Pound continue:

One factor that may partially explain the discrepancy is our different approaches to measuring conservatism and how the relevant questions were framed. . . . However, these methodological differences seem unlikely to fully explain the discrepancy between our results . . . One further possibility is that differences in responses to our survey and the other surveys discussed here are attributable to variation in the samples surveyed. . . .

Sure, but aren’t you ignoring the elephant in the room? Why is there any discrepancy to explain? Why not at least raise the possibility that those earlier publications were just examples of the much-documented human ability to read patterns in noise.

I suspect that Scott and Pound have considered this explanation but felt it would be politic not to explicitly suggest it in their paper.

P.S. The above graph is a rare example of a double-y-axis plot that isn’t so bad. But the left axis should have a lower bound at 0: it’s not possible for conception risk to be negative!

July 4th

Lucky to have been born an American.

“Why should anyone believe that? Why does it make sense to model a series of astronomical events as though they were spins of a roulette wheel in Vegas?”

Deborah Mayo points us to a post by Stephen Senn discussing various aspects of induction and statistics, including the famous example of estimating the probability the sun will rise tomorrow. Senn correctly slams a journalistic account of the math problem:

The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child’s degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.

[The above quote is not by Senn; it’s a quote of something he disagrees with!]

Canonical and wrong. X and I discuss this problem in section 3 of our article on the history of anti-Bayesianism (see also rejoinder to discussion here). We write:

The big, big problem with the Pr(sunrise tomorrow | sunrise in the past) argument is not in the prior but in the likelihood, which assumes a constant probability and independent events. Why should anyone believe that? Why does it make sense to model a series of astronomical events as though they were spins of a roulette wheel in Vegas? Why does stationarity apply to this series? That’s not frequentist, it is not Bayesian, it’s just dumb. Or, to put it more charitably, it is a plain vanilla default model that we should use only if we are ready to abandon it on the slightest pretext.

Strain at the gnat that is the prior and swallow the ungainly camel that is the iid likelihood. Senn’s discussion is good in that he keeps his eye on the ball knits his row straight without getting distracted by stray bits of yarn.

Humility needed in decision-making

Brian MacGillivray and Nick Pidgeon write:

Daniel Gilbert maintains that people generally make bad decisions on risk issues, and suggests that communication strategies and education programmes would help (Nature 474, 275–277; 2011). This version of the deficit model pervades policy-making and branches of the social sciences.

In this model, conflicts between expert and public perceptions of risk are put down to the difficulties that laypeople have in reasoning in the face of uncertainties rather than to deficits in knowledge per se.

Indeed, this is the “Nudge” story we hear a lot: the idea is that our well-known cognitive biases are messing us up, and policymakers should be accounting for this.

But MacGillivray and Pidgeon take a more Gigerenzian view:

There are three problems with this stance.

First, it relies on a selective reading of the literature. . . .

Second, it rests on some bold extrapolations. For example, it is not clear how the biases Gilbert identifies in the classic ‘trolley’ experiment play out in the real world. Many such reasoning ‘errors’ are mutually contradictory — for example, people have been accused of both excessive reliance on and neglect of generic ‘base-rate’ information to judge the probability of an event. This casts doubt on the idea that they reflect universal or hard-wired failings in cognition.

The third problem is the presentation of rational choice theory as the only way of deciding how to handle risk issues.

They conclude:

Given that many modern risk crises stem from science’s inability to foresee the dark side of technological progress, a little humility from the rationality project wouldn’t go amiss.

Recently in the sister blog

Where does Mister P draw the line?

Bill Harris writes:

Mr. P is pretty impressive, but I’m not sure how far to push him in particular and MLM [multilevel modeling] in general.

Mr. P and MLM certainly seem to do well with problems such as eight schools, radon, or the Xbox survey. In those cases, one can make reasonable claims that the performance of the eight schools (or the houses or the interviewees, conditional on modeling) are in some sense related.

Then there are totally unrelated settings. Say you’re estimating the effect of silicone spray on enabling your car to get you to work: fixing a squeaky door hinge, covering a bad check you paid against the car loan, and fixing a bald tire. There’s only one case where I can imagine any sort of causal or even correlative connection, and I’d likely need persuading to even consider trying to model the relationship between silicone spray and keeping the car from being repossessed.

If those two cases ring true, where does one draw the line between them? For a specific example, see “New drugs and clinical trial design in advanced sarcoma: have we made any progress?” (inked from here). The discussion covers rare but somewhat related diseases, and the challenge is to do clinical studies with sufficient power from number of participants in aggregate and by disease subtype.

Do you know if people have successfully used MLM or Mr. P in such settings? I’ve done some searching and not found anything I recognized.

I suspect that the real issue is understanding potential causal mechanisms, but MLM and perhaps Mr. P. sound intriguing for such cases. I’m thinking of trying fake data to test the idea.

I have a few quick thoughts here:

– First, on the technical question about what happens if you try to fit a hierarchical model to unrelated topics: if the topics are really unrelated, there should be no reason to expect the true underlying parameter values to be similar, hence the group-level variance will be estimated to be huge, hence essentially no pooling. The example I sometimes give is: suppose you’re estimating 8 parameters: the effects of SAT coaching in 7 schools, and the speed of light. These will be so different that you’re just getting the unpooled estimate. The unpooled estimate is not the best—you’d rather pool the 7 schools together—but it’s the best you can do given your model and your available information.

– To continue this a bit, suppose you are estimating 8 parameters: the effects of a fancy SAT coaching program in 4 schools, and the effects of a crappy SAT coaching program in 4 other schools. Then what you’d want to do is partially pool each group of 4 or, essentially equivalently, to fit a multilevel regression at the school level with a predictor indicating the prior assessment of quality of the coaching program. Without that information, you’re in a tough situation.

– Now consider your silicone spray example. Here you’re estimating unrelated things so you won’t get anything useful from partial pooling. Bayesian inference can still be helpful here, though, in that you should be able to write down informative priors for all your effects of interest. In my books I was too quick to use noninformative priors.

Hey, this is what Michael Lacour should’ve done when they asked him for his data

A note from John Lott

The other day, I wrote:

It’s been nearly 20 years since the last time there was a high-profile report of a social science survey that turned out to be undocumented. I’m referring to the case of John Lott, who said he did a survey on gun use in 1997, but, in the words of Wikipedia, “was unable to produce the data, or any records showing that the survey had been undertaken.” Lott, like LaCour nearly two decades later, mounted an aggressive, if not particularly convincing, defense.

Lott disputes what is written on the Wikipedia page. Here’s what he wrote to me, first on his background:

You probably don’t care, but your commentary is quite wrong about my career and the survey. Since most of the points that you raise are dealt with in the post below, I will just mention that you have the trajectory of my career quite wrong. My politically incorrect work had basically ended my academic career in 2001. After having had positions at Wharton, University of Chicago, and Yale, I was unable to get an academic job in 2001 and spent 5 months being unemployed before ending up at a think tank AEI. If you want an example of what had happened you can see here. A similar story occurred at Yale where some US Senators complained about my research. My career actual improved after that, at least if you judge it by getting academic appointments. For a while universities didn’t want to touch someone who would get these types of complaints from high profile politicians. I later re-entered academia, though eventually I got tired of all the political correctness and left academia.

Regarding the disputed survey, Lott points here and writes:

Your article gives no indication that the survey was replicated nor do you explain why the tax records and those who participated in the survey were not of value to you. Your comparison to Michael LaCour is also quite disingenuous. Compare our academic work. As I understand it, LaCour’s data went to the heart of his claim. In my case we are talking about one paragraph in my book and the survey data was biased against the claim that I was making (see the link above).

I have to admit I never know what to make of it when someone describes me as “disingenuous,” which according to the dictionary, means “not candid or sincere, typically by pretending that one knows less about something than one really does.” I feel like responding, truly, that I was being candid and sincere! But of course once someone accuses you of being insincere, it won’t work to respond in that way. So I can’t really do anything with that one.

Anyway, Lott followed up with some specific responses to the Wikipedia entry:

The Wikipedia statement . . . is completely false (“was unable to produce the data, or any records showing that the survey had been undertaken”). You can contact tax law Professor Joe Olson who went through my tax records. There were also people who have come forward to state that they took the survey.

A number of academics and others have tried to correct the false claims on Wikipedia but they have continually been prevented from doing so, even on obviously false statements. Here are some posts that a computer science professor put up about his experience trying to correct the record at Wikipedia.

I hope that you will correct the obviously false claim that I “was unable to produce the data, or any records showing that the survey had been undertaken.” Now possibly the people who wrote the Wikipedia post want to dismiss my tax records or the statements by those who say that they took the survey, but that is very different than them saying that I was unable to produce “any records.” As to the data, before the ruckus erupted over the data, I had already redone the survey and gotten similar results. There are statements from 10 academics who had contemporaneous knowledge of my hard disk crash where I lost the data for that and all my other projects and from academics who worked with me to replace the various data sets that were lost.

I don’t really have anything to add here. With LaCour there was a pile of raw data and also a collaborator, Don Green, who recommended to the journal that their joint paper be withdrawn. The Lott case happened two decades ago, there’s no data file and no collaborator, so any evidence is indirect. In any case, I thought it only fair to share Lott’s words on the topic.

Introducing StataStan

stan logo

Thanks to Robert Grant, we now have a Stata interface! For more details, see:

Jonah and Ben have already kicked the tires, and it works. We’ll be working on it more as time goes on as part of our Institute of Education Sciences grant (turns out education researchers use a lot of Stata).

We welcome feedback, either on the Stan users list or on Robert’s blog post. Please don’t leave comments about StataStan here — I don’t want to either close comments for this post or hijack Robert’s traffic.

Thanks, Robert!

P.S. Yes, we know that Stata released its own Bayesian analysis package, which even provides a way to program your own Bayesian models. Their language doesn’t look very flexible, and the MCMC sampler is based on Metropolis and Gibbs, so we’re not too worried about the competition on hard problems.

God is in every leaf of every probability puzzle

Radford shared with us this probability puzzle of his from 1999:

A couple you’ve just met invite you over to dinner, saying “come by around 5pm, and we can talk for a while before our three kids come home from school at 6pm”.

You arrive at the appointed time, and are invited into the house. Walking down the hall, your host points to three closed doors and says, “those are the kids’ bedrooms”. You stumble a bit when passing one of these doors, and accidently push the door open. There you see a dresser with a jewelry box, and a bed on which a dress has been laid out. “Ah”, you think to yourself, “I see that at least one of their three kids is a girl”.

Your hosts sit you down in the kitchen, and leave you there while they go off to get goodies from the stores in the basement. While they’re away, you notice a letter from the principal of the local school tacked up on the refrigerator. “Dear Parent”, it begins, “Each year at this time, I write to all parents, such as yourself, who have a boy or boys in the school, asking you to volunteer your time to help the boys’ hockey team…” “Umm”, you think, “I see that they have at least one boy as well”.

That, of course, leaves only two possibilities: Either they have two boys and one girl, or two girls and one boy. What are the probabilities of these two possibilities?

NOTE: This isn’t a trick puzzle. You should assume all things that it seems you’re meant to assume, and not assume things that you aren’t told to assume. If things can easily be imagined in either of two ways, you should assume that they are equally likely. For example, you may be able to imagine a reason that a family with two boys and a girl would be more likely to have invited you to dinner than one with two girls and a boy. If so, this would affect the probabilities of the two possibilities. But if your imagination is that good, you can probably imagine the opposite as well. You should assume that any such extra information not mentioned in the story is not available.

As a commenter pointed out, there’s something weird about how the puzzle is written, not just the charmingly retro sex roles but also various irrelevant details such as the time of the dinner. (Although I can see why Radford wrote it that way, as it was a way to reveal the number of kids in a natural context.)

The solution at first seems pretty obvious: As Radford says, the two possibilities are:
(a) 2 boys and 1 girl, or
(b) 1 boy and 2 girls.
If it’s possibility (a), the probability of the random bedroom being a girl’s is 1/3, and the probability of getting that note (“I write to all parents . . . who have a boy or boys at the school”) is 1, so the probability of the data is 1/3.
If it’s possibility (b), the probability of the random bedroom being a girl’s is 2/3, and the probability of getting the school note is still 1, so the probability of the data is 2/3.
The likelihood ratio is thus 2:1 in favor of possibility (b).

Case closed . . . but is it?

Two complications arise. First, as commenter J. Cross pointed out, if the kids go to multiple schools, it’s not clear what is the probability of getting that note, but a first guess would be that the probability of you seeing such a note on the fridge is proportional to the number of boys in the family. Actually, even if there’s only one school the kids go to, it might be more likely to see the note prominently on the fridge if there are 2 boys: presumably, the probability that at least one boy is interested in hockey is an higher if there are two boys than if there’s only one.

The other complication is the prior odds. Pr(boy birth) is about .512, so the prior odds, which are .512/.488 in favor of the 2 boys and 1 girl, rather than 2 girls and 1 boy.

This is just to demonstrate that, as Feynman could’ve said in one of his mellower moments, God is in every leaf of every tree: Just about every problem is worth looking at carefully. It’s the fractal nature of reality.