I’m doing my first preregistered replication. And it’s a lot of work!
We’ve been discussing this for awhile—here’s something I published in 2013 in response to proposals by James Moneghan and by Macartan Humphreys, Raul Sanchez de la Sierra, and Peter van der Windt for preregistration in political science, here’s a blog discussion (“Preregistration: what’s in it for you?”) from 2014.
Several months ago I decided I wanted to perform a preregistered replication of my 2013 AJPS paper with Yair on MRP. We found some interesting patterns of voting and turnout, but I was concerned that perhaps we were overinterpreting patterns from a single dataset. So we decided to re-fit our model to data from a different poll. That paper had analyzed the 2008 election using pre-election polls from Pew Research. The 2008 Annenberg pre-election poll was also available, so why not try that too?
Since we were going to do a replication anyway, why not preregister it? This wasn’t as easy as you might think. First step was getting our model to fit with the old data; this was not completely trivial given changes in software, and we needed to tweak the model in some places. Having checked that we could successfully duplicate our old study, we then re-fit our model to two surveys from 2004. We then set up everything to run on Annenberg 2008. At this point we paused, wrote everything up, and submitted to a journal. We wanted to time-stamp the analysis, and it seemed worthwhile to do this in a formal journal setting so that others could see all the steps in one place. The paper (that is, the preregistration plan) was rejected by the AJPS. They suggested we send it to Political Analysis, but they ended up rejecting it too. Then we sent it to Statistics, Politics, and Policy, which agreed to publish the full paper: preregistration plan plus analysis.
But, before doing the analysis, I wanted to time-stamp the preregistration plan. I put the paper up on my website, but that’s not really preregistration. So then I tried Arxiv. That took awhile too—it first they were thrown off by the paper being incomplete (by necessity, as we want to first publish the article with the plan but without the replication results). But they finally posted it.
The Arxiv post is our official announcement of preregistration. Now that it’s up, we (Rayleigh, Yair, and I) can run the analysis and write it up!
What have we learned?
Even before performing the replication analysis on the 2008 Annenberg data, this preregistration exercise has taught me some things:
1. The old analysis was not in runnable condition. We and others are now in position to fit the model to other data much more directly.
2. There do seem to be some problems with our model in how it fits the data. To see this, compare Figure 1 to Figure 2 of our new paper. Figure 1 shows our model fit to the 2008 Pew data (essentially a duplication of Figure 2 of our 2013 paper), and Figure 2 shows this same model fit to the 2004 Annenberg data.
So, two changes: Pew vs. Annenberg, and 2008 vs. 2004. And the fitted models look qualitatively different. The graphs take up a lot of space, so I’ll just show you the results for a few states.
We’re plotting the probability of supporting the Republican candidate for president (among the supporters of one of the two major parties; that is, we’re plotting the estimates of R/(R+D)) as a function of respondent’s family income (divided into five categories). Within each state, we have two lines: the brown line shows estimated Republican support among white voters, and the black lines shows estimated Republican support among all voters in the state. Y-axis goes from 0 to 100%.
From Figure 1:
From Figure 2:
You see that? The fitted lines are smoother in Figure 2 than in Figure 1, they seem to be tied closer to the data points. It appears as if this is coming from the raw data, which seem in Figure 2 to be closer to clean monotonic patterns.
My first thought was that this was something to do with sample size. OK, that was my third thought. My first thought was that it was a bug in the code, and my second thought was that there was some problem with coding of the income variable. But I don’t think it was any of these things. Annenberg 2004 had a larger sample than Pew 2008, so we re-fit to two random subsets of those Annenberg 2004 data, and the resulting graphs (not shown in the paper) look similar to the Figure 2 shown above; they were still a lot smoother than Figure 1 which shows results from Pew 2008.
We discuss this at the end of Section 2 of our new paper and don’t come to any firm conclusions. We’ll see what turns up with the replication on Annenberg 2008.
Anyway, the point is:
– Replication is not so easy.
– We can learn even from setting up the replications.
– Published results (even from me!) are always only provisional and it makes sense to replicate on other data.