Dale Lehman writes:

I’ve been looking at the work of the Equality of Opportunity Project and noticed that you had commented on some of their work.

Since you are somewhat familiar with the work, and since they do not respond to my queries, I thought I’d ask you about something that is bothering me. I, too, was somewhat put off by their repeated use of the word “causation.” But what really concerns me is that it appears that the work is based on taking huge samples (millions of people) and doing the analysis based on aggregations of them into deciles. Isn’t this demonstrating ecological correlation—which would be fine, except that their interpretations all involve predictive and causative statements at the individual level. In other words, they find a close relationship between various aggregate measures—such as the percentile income rank and the percent attending college—and then interpret that correlation as representing individual correlations. The individual correlations are guaranteed to be weaker than the aggregate ones, and perhaps not even in the same direction.

There is significant effort in this work and it will take me a long time to understand exactly what they have done, but I thought you might be able to save me a bunch of time by telling me whether this is something worth pursuing. I would think that these researchers would be well aware of ecological correlations, but I was constantly puzzled by why their scatterplots have so few points when the sample sizes are so large. Finding a strong linear correlation between aggregate measures conveys a compelling story—but it may not be a true story.

My reply: I’m not sure. This update (which Lehman pointed me to) shows a bunch of individual-level results as well. So it reminds me of our Red State Blue State project where we used individual-level data where possible but also examined aggregate patterns.

One thing about the data as presented on fivethiryeight struck me. The graph of income at age 30 vs parental income, using percentiles, shows regression to the mean. Children whose parents had about the median income earned about the median income. For parents below the 50th percentile, their children’s earnings were in a higher percentile. For parents above the 50th percentile, children’s earnings were in a lower percentile.

That’s a clear sign of regression to the mean. It would be interesting to see if that movement continues in the next generation.