Hal Pashler wrote in about a recent paper, “Labor Market Returns to Early Childhood Stimulation: a 20-year Followup to an Experimental Intervention in Jamaica,” by Paul Gertler, James Heckman, Rodrigo Pinto, Arianna Zanolini, Christel Vermeerch, Susan Walker, Susan M. Chang, and Sally Grantham-McGregor. Here’s Pashler:
Dan Willingham tweeted: @DTWillingham: RCT from Jamaica: Big effects 20 years later of intervention—teaching parenting/child stimulation to moms in poverty http://t.co/rX6904zxvN
Browsing pp. 4 ff, it seems the authors are basically saying “hey the stats were challenging, the sample size tiny, other problems, but we solved them all—using innovative methods of our own devising!—and lo and behold, big positive results!”.
So this made me think (and tweet) basically that I hope the topic (which is pretty important) will happen to interest Andy Gelman enough to incline him to give us his take. If you happen to have time and interest…
My reply became this article at Symposium magazine. In brief:
The two key concerns seem to be: (1) very small sample size (thus, unless the effect is huge, it could get lost in the noise) and (2) correlation of the key outcome (earnings) with emigration.
The analysis generally seems reasonable (as one would expect given that Heckman is a coauthor) but what I’d really like to see are graphs of the individual observations. And, as always in such settings, I’d like to see the raw comparison—what are these earnings, which, when averaged, differ by 42%? I’d also like to see these data broken down by emigration status. That bit did worry me a bit. Once I have a handle on the raw comparisons, then I’d like to see how this fits into the regression analyses.
Overall I have no reason to doubt the direction of the effect—psychosocial stimulation should be good, right?—but I’m skeptical of the 42% claim, for the usual reasons of the statistical significance filter. An example where this might be happening is in the very last paragraph on page 6 that continues onto the top of page 7. There they are doing lots of hypothesizing based on some comparisons being statistically significant and others being non-significant (at least, that’s what I think they meant when they wrote of “strong and lasting effects” in one case and “no long-term effect” in the other). There’s nothing wrong with speculation but at some point you’re chasing noise and picking winners, which leads to overestimates of magnitudes of effects.
So those are my thoughts. My goal here is not to “debunk” but to understand and quantify.
I also talk a bit about the political context of those debates.
Here’s my conclusion (for now):
Where does that leave us, then? If we can’t really trust the headline number from a longitudinal randomized experiment, what can we do? We certainly can’t turn around and gather data on a few thousand more children. If we do, we’d have to wait another 20 years. What can we say right now?
My unsatisfactory answer: I’m not sure. The challenge is that earnings are highly variable. We could look at the subset of participants who did not emigrate, or, if there is a concern that the treatment could affect emigration, we could perform an analysis such as principal stratification that matches approximately equivalent children in the two groups to estimate the effect among the children who would not have emigrated under either condition. Given that there were four groups, I’d do some alternative analyses rather than simply pooling multiple conditions, as was done in the article. But I’m still a little bit stuck. On one hand, given the large variability in earnings, it’s going to be difficult to learn much this sort of small-sample between-person study. On the other hand, there aren’t a lot of good experimental studies out there, so it does seem like this one should inform policy in some way. In short, we need to keep on thinking of ways to extract the useful information out of this study in a larger policy context.