This story started for me three years ago with a pre-election article by Tyler Cowen and Kevin Grier entitled, “Will Ohio State’s football team decide who wins the White House?.” Cowen and Grier wrote:
Economists Andrew Healy, Neil Malhotra, and Cecilia Mo . . . examined whether the outcomes of college football games on the eve of elections for presidents, senators, and governors affected the choices voters made. They found that a win by the local team, in the week before an election, raises the vote going to the incumbent by around 1.5 percentage points. When it comes to the 20 highest attendance teams—big athletic programs like the University of Michigan, Oklahoma, and Southern Cal—a victory on the eve of an election pushes the vote for the incumbent up by 3 percentage points.
Hey, that’s a big deal (and here’s the research paper with the evidence). As Cowen and Grier put it:
That’s a lot of votes, certainly more than the margin of victory in a tight race.
Upon careful examination, though, I concluded:
There are multiple games in multiple weeks in several states, each of which, according to the analysis, operates on the county level and would have at most a 0.2% effect in any state. So there’s no reason to believe that any single game would have a big effect, and any effects there are would be averaged over many games.
So I wasn’t so disturbed about the legitimacy of the democratic process. That said, it still seemed a little bit bothersome that football games were affecting election outcomes at all.
The next chapter in the story came a couple days ago, when Anthony Fowler pointed me to a paper he wrote with B. Pablo Montagnes, arguing that this college football effect was nothing but a meaningless data pattern, the sort of thing that might end up getting published in a rag like Psychological Science:
We reassess the evidence and conclude that there is likely no such effect, despite the fact that Healy et al. followed the best practices in social science and used a credible research design. Multiple independent sources of evidence suggest that the original finding was spurious—reflecting bad luck for researchers rather than a shortcoming of American voters.
We might worry that this surprising result is a false positive, arising from some combination of multiple testing (within and across research teams), specification searching, and bad luck.
I discussed this all yesterday in the sister blog, concluding that voters aren’t as “irrational and emotional” as is sometimes claimed.
Neil Malhotra, one of the authors of the original paper in question, pointed to two replies that he and his colleagues wrote (here and here).
And Anthony Fowler pointed to two responses (here and here) that he and his colleague wrote to the above-mentioned replies.
And in an email to Malhotra, I wrote: My quick thought is that the framework of “is it or is it not a false positive” is not so helpful, and I prefer thinking of all these effects as existing, but with lots of variation, so that one has to be careful about drawing general conclusions from a particular study.
Three yards and a crowd of dust
Where do the two parties stand now?
I think the findings of the Fowler/Montagnes paper are incorrectly interpreted by the authors. Instead of a framing of “the original effect was a false positive,” I think a more accurate/appropriate framing is:
1. Some independent, separate tests conducted by Fowler/Montagnes are not consistent with the original study. Therefore, in a meta-analytic sense, the overall literature produces mixed results. Fowler/Montagnes’ results in no way mean that the original study was “wrong.” However, we argue that these independent tests are not appropriate to test the hypothesis that mood affects voting. For
example, it’s not surprising to us that NFL football outcomes do not influence elections since NFL teams are located in large metropolises where there are many competing sources of entertainment. On the other hand, the fate of the Oklahoma Sooners in Norman, OK, is the main event in the town. Further, single, early regular season games in the NFL are less important than later, regular season games in NCAA football. So the dosage of good/mad mood is much lower in the NFL study. Now readers may agree/disagree with me about the validity of the NFL test. But the important thing to realize is that the NFL study is a different, separate test. It doesn’t tell us anything about whether the original study is a “false positive” or is incorrect.
2. Some tests on sub-samples of the original dataset show that there is heterogeneity in the effect. For example, Fowler/Montagnes show that the effect does not seem to be there when voters appear to have more information (e.g., in open-seat races, and when there is partisan competition). This is very theoretically interesting (and cool) heterogeneity, and it definitely changes our interpretation of the original findings. However, these are not “replications” of the original result, and do not speak to whether the original results are false positives.
In sum, I am very open to criticisms of my research. I think this new paper definitely changes my opinion of the scope of the original findings. However, I do not think it is accurate to say that the original findings are incorrect or that the findings were obtained by “bad luck.” There are some auxiliary tests conducted by Fowler/Montanges that either support our results (e.g., that geographically proximate locations outside the home county also respond to the team’s wins/losses) or don’t make much sense (e.g., Texas is a minor college football team?), but we will let interested readers weigh the evidence.
And here’s Fowler:
Of course, we wouldn’t conclude that the effect of college football games on elections is exactly zero, but our independent tests suggest that most likely the effect is substantively very small and Healy et al.’s original results were significant overestimates.
If their purported effects are genuine, we would expect them to vary in particular ways, but none of these independent tests are consistent with the notion that football games and subsequent mood influence elections. So by examining treatment effect heterogeneity in theoretically motivated ways, we reassess the credibility of the original result.
We’re getting closer. I’ll invoke the Edlin effect and say that I think the originally published estimates are indeed probably too high (and, as noted near the beginning of this post, even the effects as reported would have a much much more minor effect on elections than you might think based on a naive interpretation of the numerical estimates of direct causal effects). Based on my own statistical tastes, I’d prefer not to “test the hypothesis that mood affects voting” but instead to think about variation, and I like the part of Malhotra’s note that discusses this.
When it comes to the effects of mood on voting, I think that the point of studying things like football games is not that their political effects are large (that is, let’s ignore that original Slate article) but rather that sporting events have a big random component, so these games can be treated as a sort of natural experiment. As Fowler, Malhotra, and their colleagues all recognize, such analyses can be challenging, not so much because of the traditional “identification” problem familiar to statisticians, econometricians, and political scientists (although, yes, one does have to worry about such concerns), but rather because of the garden of forking paths, all the possible ways that one could chop up these data and put them back together again.
There is no two-minute warning here. There is no game that is about to end. Research on these topics will continue. I agree with both Malhotra and Fowler that the way to think about these problems is by wrestling with the details. And I do think these discussions are good, even if they can take on a slightly adversarial flavor. I’d like the mood-and-politics literature to stay grounded, to not end up like the the Psychological-Science-style evolutionary psychology literature, where effects are all huge and where each new paper reports a new interaction. Now that we know that a subfield can be spun out of nothing, we should be careful to put research findings into context—something that both Malhotra and Fowler are trying to do, each in their own way.
To put it another way, they’re disagreeing about effect sizes and strength of evidence, but they both accept the larger principle that research studies don’t stand alone, and Healy et al. are open and accepting of criticism. Which one would think would be a given in science, but it’s not, so let’s appreciate how this is going.
And to get back to football for a moment: As a political scientist this is all important to me because, to the extent that things like sporting events sway people’s votes, that’s a problem—but, to the extent that such irrationalities are magnified by statistical studies and hyped by the news media, that’s a problem too, in that this can be used to disparage the democratic process.
Fowler saw the above and added:
Some of Neil’s comments reflect a misunderstanding of our paper. For example, we did not show that “the effect does not seem to be there when voters appear to have more information” and we never wrote anything along those lines. In the test he’s alluding to (Table 1, “By incumbent running”), we find that the purported effect of football games on incumbent party support is no greater when the incumbent actually runs for reelection. One of our predictions is that if football games meaningfully influence incumbent party support by affecting voter mood, we should expect a bigger effect when the incumbent is actually running. However, the interactive point estimate is actually negative and statistically insignificant, meaning that we don’t find variation in the direction one would expect if the effect is genuine, and if anything, the variation goes in the wrong direction.
Neil’s comments also sound to us like ex-post rationalization. He argues that we shouldn’t expect NFL games to influence local mood in the same way that college football games do, but the local television ratings suggest the opposite. In another interview (here), Neil justified the notion that football games influence mood by citing Card and Dahl who find that football games influence domestic violence. But Card and Dahl analyze NFL games, and they provide arguments as to why the NFL provides the best opportunity to estimate the effects of changes in local mood. In our view, Healy et al. contradict themselves by rationalizing the null effect in the NFL while at the same time citing an NFL study as evidence that football games influence mood.