Problems with Census data

Posted on February 2, 2010 1:21 PM by Andrew

Following this link from John Sides, I read this blog by Justin Wolfers on a problem with U.S. Census data discovered by Trent Alexander, Michael Davern and Betsey Stevenson:

The authors compare the official census count (based on the tallying up of all Census forms) with their own calculations, based on the sub-sample released for researchers (the “public use micro sample,” available through IPUMS). If all is well, then the authors’ estimates should be very close to 100% of the official population count. But they aren’t:

The two estimates are pretty similar for those younger than 65. But then things go haywire, with the alternative estimates disagreeing by as much as 15%. . . .

What’s the source of the problem? The Census Bureau purposely messes with the microdata a little, to protect the identity of each individual. . . . But the problem arose because of a programming error in how the Census Bureau ran these procedures. The right response is obvious: fix the programs, and publish corrected data. Unfortunately, the Census Bureau has refused to correct the data.

Huh? There must be something I’m missing here. Also, apparently there are similar problems with the American Community Survey and the Current Population Survey.

P.S. There’s one place where I disagree with Justin, though. He writes:

The microdata suggest that there are more very old men than very old women — I know some senior women who wish this were true!

You better do the decision analysis on this one carefully. If “this were true,” presumably it would mean that these senior women might very well be wishing themselves to be dead!

P.P.S. Usually there’s little point to me linking to a blog that gets 100 times our readership, but this is important enough, both to political scientists and more generally in sending the message that we must always check our data, that I wanted to highlight it here. As Daniel Lee can tell you, I spend lots and lots and lots of time checking my data. Even so, I’ve used IPUMS and never checked this!

5 thoughts on “Problems with Census data”

Jeff Lax on February 2, 2010 9:34 AM at 9:34 am said:

Did they use the weights in the PUMS file?
Jeff Lax on February 2, 2010 10:01 AM at 10:01 am said:

They're just trying to kill Mr. P.
Charles Sutton on February 2, 2010 12:33 PM at 12:33 pm said:

If "this were true," presumably it would mean that these senior women might very well be wishing themselves to be dead!

Huh? Could just mean that they're wishing their competition dead (i.e., other old ladies), or, more happily, wishing that men survive longer than they do.
Andrew Gelman on February 2, 2010 1:45 PM at 1:45 pm said:

Jeff: Yeah, everthing happens to us.

Charles: Well, sure, if all the men could live longer, that would be great. My point was that Justin's elderly women friends might very well be wishing their competition were dead, but, logically speaking, that can be thought of as a wish of an increased probability that they might be dead too.
Dan on February 3, 2010 6:38 AM at 6:38 am said:

Andrew —
The reason the Census Bureau refuses to correct the problem is that they're afraid if they do so, less-than-scrupulous researchers will be able to back out the precise procedure the Census uses to protect subject's identity (by comparing the old and new data). They also apparently refuse to draw a new, corrected, 5% IPUMS for the affected years; I'm not sure if this is because it would be too much work, or because the privacy concerns remain.

Comments are closed.