Skip to content

Was it really necessary to do a voting experiment on 300,000 people? Maybe 299,999 would’ve been enough? Or 299,998? Or maybe 2000?

Screen Shot 2014-10-29 at 6.04.24 PM

There’s been some discussion recently about an experiment done in Montana, New Hampshire, and California, conducted by three young political science professors, in which letters were sent to 300,000 people, in order to (possibly) affect their voting behavior. It appears that the plan was to follow up after the elections and track voter turnout. (Some details are in this news report from Dylan Scott.)

The Montana experiment was particularly controversial, with disputes about the legality and ethicality of sending people an official-looking document with the Montana state seal, the official-looking title of “2014 Montana General Election Voter Information Guide,” and the instructions, “Take this to the polls!”

The researchers did everything short of sending people the message, “I am writing you because I am a prospective doctoral student with considerable interest in your research,” or, my all-time favorite, “Our special romantic evening became reduced to my wife watching me curl up in a fetal position on the tiled floor of our bathroom between rounds of throwing up.”

There’s been a bunch of discussion of this on the internet. From a statistical standpoint, the most interesting reaction was this, from Burt Monroe commenting on Chris Blattman’s blog. Burt writes:

In my “big data guy” role, I have to ask why 100,000? As far as I can tell there’s exactly one binary treatment, or is there some complicated factorial interaction that leads to a number that big? For that to be the right power number, the treatment effect has to be microscopic and the instrument has to be incredibly weak — that is, it has to be a poorly designed experiment of a substantively unimportant question. Conversely, if the team believes the treatment will work, 100000 is surely big enough to actually change outcomes. . . .

Related to this is the following quote from the Dylan Scott news article, from a spokesman for one of the universities involved, that the now-controversial study “was not intended to favor any particular candidate, party, or agenda, nor is it designed to influence the outcome of any race.”

But a treatment that affects voter turnout will, in general, influence the outcome of the race (the intervention in Montana involved giving voters an estimate of the ideological leanings of the candidates in a nonpartisan (or, according to Chris Blattman, a “technically nonpartisan”) judicial election, along with the prominently-displayed recommendation to “Take this to the polls!”).

So I’m not sure how they could say the study was not designed to influence the outcome—unless what was done was to purposely pick non-close elections where a swing of a few hundred votes wouldn’t make a difference in who won. Even then, though, the election could be affected if people are motivated by the flyer to come out to vote in the first place, assuming there are any close elections in the states or districts in question.

I will set aside any ethical questions about the extent to which academic researchers should be influencing elections or policy, via experiments or other activities. Chris Blattman makes a good point that ultimately in our research we typically have some larger political goal, whether it’s Robert Putnam wanting to increase Americans’ sense of local community, or Steven Levitt wanting more rational public policy, or various economists wanting lower tariff barriers, and research is one way to attain a political end. The election example is a bit different in that the spokesman for one of the universities involved is flat-out denying the intention of any agenda, but Chris might say that, whether or not that spokesman knows what he’s talking about, having an agenda is basically ok.

To put it another way: Suppose the researchers in question had done this project, not using foundation funding, but under the auspices of an explicitly political organization such as the Democratic or Republican party, and with an avowed goal of increasing voter turnout for one candidate in particular. (One of the researchers involved in this study apparently has political consulting company, so perhaps he is already doing such a study.) Lots of social scientists work with political or advocacy organizations in this way, and people don’t generally express any problems with that. To the extent that the study in question is, ummm, questionable, it’s the idea that it violates an implicit wall separating research with a particular political or commercial purpose, and research with no such purpose. I have mixed feelings here. On one hand, Chris is right that no such sharp boundary exists. On the other hand, I understand how voters in Montana might feel a bit like “lab rats” after hearing about this experiment—even while a comparable intervention done for one party or another would be considered to be just part of modern political advertising.

But I don’t want to get into that here. What I do want to discuss is the statistical question raised by Burt Monroe in the above quote: why a sample size of 100,000 for that Montana experiment? As Monroe points out, 100,000 mailers really could be enough to swing an election. But if you need 100,000, you must really be studying small effects, or maybe localized effects?

When we study public opinion it can be convenient to have a national sample size of 50,000 or more, in order to get enough information on small states interacted with demographic slices (see, for example, this paper with Yair). And in our Xbox survey we had hundreds of thousands of respondents, which was helpful in allowing us to poststratify and also to get good estimates day by day during the pre-election period.

In this election-mailers study it’s not clear why such a large sample was needed, but I’m guessing there was some good reason. The study was expensive and I assume the researchers had to justify the expense and the sample size. A bad reason would be that they expected a very small and noisy effect—as Burt said, in that case it’s not clear why it would be worth studying in the first place. A good reason might be that they’re interested in studying effects that vary by locality. Another possibility is that they are studying spillover effects, the idea that sending a mailer to household X might have some effect on neighboring households. Such a spillover effect is probably pretty small so you’d need a huge sample size to estimate it, but then again if it’s that small it’s not clear it’s worth studying, at least not in that way.

P.S. Full disclosure: I do commercial research (Stan is partly funded by Novartis) and my collaborator Yair Ghitza works at Catalist, a political analysis firm that does work for Democrats. And my office is down the hall from that of Don Green, who I think has done work for Democrats and Republicans. So not only do I accept partisan or commercial work in principle, I support it in practice as well. I’m not close enough to the details of such interventions to know if it’s common practice to send out fake-official mailers, but I guess it’s probably done all the time.

P.P.S. No worry about the prof who was last seen lying curled up in the fetal position on his bathroom floor. He landed on his feet and is a professor at the Stanford Business School. Amusingly enough, he is offering “breakfast briefings” (no throwing up involved, I assume) and studies “how employees can develop healthy patterns of cooperation.”

Statistical distribution of incomes in different countries, and a great plot

This post is by Phil Price.

This article in the New York Times is pretty good, and the graphics are excellent…especially the interactive graphic halfway down, entitled “American Incomes Are Losing Their Edge, Except at the Top” (try mousing over the gray lines and see what happens).

The plot attempts to display the statistical distribution of incomes in about 10 different countries. That alone is not so easy; one natural idea is to display a bunch of histograms in a small multiples plot. But the plot also tries to show how each of the distributions has changed since 1980. I can think of other approaches to this plot that might be worth trying, but I’m not sure any of them would be better. Nicely done, NYT graphics team.

If I wanted to make interactive graphics like this myself, I could presumably figure it out. But suppose I want to do it routinely, as part of exploratory data analysis. I don’t necessarily need polish, just the basics. I’d like to work within R but am open to other possibilities. What are my options? GGobi? iPlots? Anything else worth considering?

This post is by Phil Price

I love it when I can respond to a question with a single link

Shira writes:

This came up from trying to help a colleague of mine at Human Rights Watch.

He has several completely observed variables X, and a variable with 29% missing, Y. He wants a histogram (and other descriptive statistics) of a “filled in” Y.

He can regress Y on X, and impute missing Y’s from their fully observed X values (from the posterior predictive distribution). If he wants a histogram of the “filled in” Y, what would you recommend to him? Is there a good way to display this, taking the uncertainty in the imputed Y’s into account?

My reply:

Body-slam on the sister blog

John Ahlquist and Scott Gehlbach nail it.

Yes, I’ll help people for free but not like this!

I received the following (unsolicited) email:

Dear Sir or Madam,

My name is **; I am a graduate student, working on my thesis in **. A vital part of my research is performing a joint cluster analysis of attributional and relational data on **.

I have tried to collaborate with the statisticians at ** and **, though neither department has been able to come up with a viable solution. May I please ask a few minute of your time to look at my problem and help me with your guidance?

I am attaching the two data sets for relational and attributional data in case you want to glance at it.

Thank you in advance for your consideration and any time you can spare!



“Dear Sir or Madam,” huh?

And sometimes they don’t spam you at all

I received the following email:

Dear Dr. Gelman,

As a way of introduction, my name is . . . and I am very interested in studying in Columbia’s PhD statistics program. For the past 2 ½ years, I’ve worked as an analyst for . . . I am writing to communicate my interest in your research area. I was particularly interested in your publication titled . . . I believe my scholastic and work experiences have prepared me well to be a strong contributor to the field of statistics. . . . I would love the opportunity to discuss your research and Columbia’s statistics department in greater depth. . . .

I replied, generically:

Hi, thanks for your note. You should apply to our program. I’m not on the admissions committee this year but if you are admitted and choose to attend, you will be free to study with any of the faculty in our department, including me. We have several interesting projects involving statistical modeling and social science and there’s lots of important research to be done!

And, a bit later, he wrote:

Thank you for your response. . . . I am planning a visit to New York City from . . . If you are available sometime in that interval to have an informal discussion about Columbia’s statistics department I would love to meet.

I proposed a date and time, and then I wrote:

Just please tell me you’re not this guy:

If you are, I will scream.

He replied, assuring me that he is a real human. We shall see.

2 on chess

Is it really “often easier to win a rematch than to defend a championship”?

The quoted bit above comes from Tyler Cowen, writing about the Anand/Carlsen world championship rematch. I’m still not used to the idea of a new world championship match every year but I guess why not?

Anyway, here’s my question. Tyler Cowen knows a lot more about chess than I do, so I’m inclined to take his word on these things, but . . . really? Is it really often easier to win a rematch than to defend a championship? My impression is that chess rematches are generally won by the victor of the first match. Consider Karpov-Korchnoi and Kasparov-Karpov (not counting that first match that never came to an end). But I’m no expert on chess history, and I imagine there could be lots of data at lower levels (for example, national championship matches)?

So does anyone know? My intuition would be that, statistically, it would be unlikely for the former champion who lost the first match to then win the rematch, but maybe not? Or maybe I’m missing Cowen’s point?

Chess piece survival rates: some graphical possibilities here

Another Tyler Cowen link led me to this graph of chess piece survival rates, posted by Enrique Guerra-Pujol:


This is great. But now let me complain a bit about the display:

1. The lines are too thick. They’re so thick that this is about all I notice. They’re as thick as the black frames of hipster glasses. Bad news. The display just doesn’t look like a chessboard.

2. “48.6%”? Huh? This is the sort of hyper-precision we don’t like here. It’s hard to see what’s going on with that sort of detail.

3. Kings do get captured—that’s just checkmate, right? So this could be shown too.

OK, the graph can be better. But here’s what I really want to see:

Now that we have the idea of a spatial display for summarizing chess games, I’d like to see where the pieces get captured. I’m not sure the best way of doing this but perhaps a separate mini-chessboards for each piece, with a heat map showing where the piece gets captured? The maps for the pawns could be combined in some clever way, as they will mostly be captured in their own files, I assume.

P.S. Also it would be fun to see survival times: not just if the piece was captured, but when it was captured. Come to think of it, it would easy (given the data) to plot survival curves for each of the pieces, most simply with 16 graphs showing, for each piece, the curve for Black and the curve for White.

On deck this week

Mon: 2 on chess

Tues: Yes, I’ll help people for free but not like this!

Wed: I love it when I can respond to a question with a single link

Thurs: Sokal: “science is not merely a bag of clever tricks . . . Rather, the natural sciences are nothing more or less than one particular application — albeit an unusually successful one — of a more general rationalist worldview”

Fri: Boo! Who’s afraid of availability bias?

Sat: This is where they publish the stuff that’s can’t make it into Psychological Science

Sun: Ray Could Write

Solution to the sample-allocation problem

See this recent post for background.

Here’s the question:

You are designing an experiment where you are estimating a linear dose-response pattern with a dose that x can take on the values 1, 2, 3, and the response is continuous. Suppose that there is no systematic error and that the measurement variance is proportional to x. You have 100 people in your experiment. How should you allocate them among the x=1, 2, and 3 conditions to best estimate the dose-response slope?

And here’s the answer:  The estimate of the regression slope is simply (xbar_3 – xbar_1)/2 [ummm, not in general, see George's comment here], and the variance of this estimate is (sigma_1^2/n_1+ sigma_3^2/n_3)/4. There’s no use putting any observations on the x=2 condition so we can write n_3=100-n_1, so the variance we are trying to minimize is (sigma_1^2/n_1 + sigma_3^2/(100-n_1))/4.  Differentiate this with respect to n_1 (and ignore the constant factor of 1/4) and set the derivative to 0:

-sigma_1^2/n_1^2 + sigma_3^2/(100-n_1)^2 = 0.

Thus the optimum occurs when n_1/(100-n_1) = sigma_1/sigma_3.

It says above that the variance is proportional to x, so sigma_1/sigma_3 = 1/sqrt(3), thus n_1/(100-n_1) = 1/sqrt(3).

That is, n_1 and n_3 are in the ratio 1 to sqrt(3).  So n_1 = 100*1/(1+sqrt(3)) and n_3 = 100*sqrt(3)/(1+sqrt(3)), which gives n_1=37 and n_3=63.  (I rounded because you can’t take a fraction of a person.)

Of my three problems on the exam, this one was the most straightforward.  It’s a problem that can be solved by brute force.  So I was surprised and disappointed that none of the students came even close to the right answer.

I guess it’s a harder problem than I thought it was (or maybe I’m overgeneralizing from the n=4 students who took the exam).

P.S. In comments some people asked why we are assuming linearity; why not estimate a nonlinear relation? The answer is that, with moderate data, it can be difficult to estimate something nonlinear, especially if you’re going to insist on statistical significance. See here for an article from 2000 exploring this point.

Solution to the problem on the distribution of p-values

See this recent post for background.

Here’s the question:

It is sometimes said that the p-value is uniformly distributed if the null hypothesis is true. Give two different reasons why this statement is not in general true. The problem is with real examples, not just toy examples, so your reasons should not involve degenerate situations such as zero sample size or infinite data values.

And here’s the answer:  Reason 1 is that if you have a discrete sample space, the p-value will have a discrete distribution.  Reason 2 is that if you have a composite null hypothesis, the p-value will, except in some special cases, depend on the value of the nuisance parameter.

All 4 of the students gave reason 1, but none of them gave reason 2.  And none of them gave any other good reason.

We’ll do the final question tomorrow.