Skip to content

Problems with Census data

Following this link from John Sides, I read this blog by Justin Wolfers on a problem with U.S. Census data discovered by Trent Alexander, Michael Davern and Betsey Stevenson:

The authors compare the official census count (based on the tallying up of all Census forms) with their own calculations, based on the sub-sample released for researchers (the “public use micro sample,” available through IPUMS). If all is well, then the authors’ estimates should be very close to 100% of the official population count. But they aren’t:


The two estimates are pretty similar for those younger than 65. But then things go haywire, with the alternative estimates disagreeing by as much as 15%. . . .

What’s the source of the problem? The Census Bureau purposely messes with the microdata a little, to protect the identity of each individual. . . . But the problem arose because of a programming error in how the Census Bureau ran these procedures. The right response is obvious: fix the programs, and publish corrected data. Unfortunately, the Census Bureau has refused to correct the data.

Huh? There must be something I’m missing here. Also, apparently there are similar problems with the American Community Survey and the Current Population Survey.

P.S. There’s one place where I disagree with Justin, though. He writes:

The microdata suggest that there are more very old men than very old women — I know some senior women who wish this were true!

You better do the decision analysis on this one carefully. If “this were true,” presumably it would mean that these senior women might very well be wishing themselves to be dead!

P.P.S. Usually there’s little point to me linking to a blog that gets 100 times our readership, but this is important enough, both to political scientists and more generally in sending the message that we must always check our data, that I wanted to highlight it here. As Daniel Lee can tell you, I spend lots and lots and lots of time checking my data. Even so, I’ve used IPUMS and never checked this!


  1. Jeff Lax says:

    Did they use the weights in the PUMS file?

  2. Jeff Lax says:

    They're just trying to kill Mr. P.

  3. Charles Sutton says:

    If "this were true," presumably it would mean that these senior women might very well be wishing themselves to be dead!

    Huh? Could just mean that they're wishing their competition dead (i.e., other old ladies), or, more happily, wishing that men survive longer than they do.

  4. Andrew Gelman says:

    Jeff: Yeah, everthing happens to us.

    Charles: Well, sure, if all the men could live longer, that would be great. My point was that Justin's elderly women friends might very well be wishing their competition were dead, but, logically speaking, that can be thought of as a wish of an increased probability that they might be dead too.

  5. Dan says:

    Andrew –
    The reason the Census Bureau refuses to correct the problem is that they're afraid if they do so, less-than-scrupulous researchers will be able to back out the precise procedure the Census uses to protect subject's identity (by comparing the old and new data). They also apparently refuse to draw a new, corrected, 5% IPUMS for the affected years; I'm not sure if this is because it would be too much work, or because the privacy concerns remain.