Elan B. writes:
I saw this JAMA Pediatrics article [by Julia Raifman, Ellen Moscoe, and S. Bryn Austin] getting a lot of press for claiming that LGBT suicide attempts went down 14% after gay marriage was legalized.The heart of the study is comparing suicide attempt rates (in last 12 months) before and after exposure — gay marriage legalization in their state. For LGBT teens, this dropped from 28.5% to 24.5%.In order to test whether this drop was just an ongoing trend in dropping LGBT suicide attempts, they do a placebo test by looking at whether rates dropped 2 years before legalization. In the text of the article, they simple state that there is no drop.But then you open up the supplement and find that about half of the drop in rates — 2.2% — already came 2 years before legalization. However, since 0 is contained in the 95% confidence interval, it’s not significant! Robustness check passed.In figure 1 of the article, they graph suicide attempts before legalization to show they’re flat, but even though they have the data for some of the states they don’t show LGBT rates.Very suspicious to me, what do you think?
My reply: I wouldn’t quite say “suspicious.” I expect these researchers are doing their best; these are just hard problems. What they’ve found is an association which they want to present as causation, and they don’t fully recognize that limitation in their paper.
Here are the key figures:
And from here it’s pretty clear that the trends are noisy, so that little differences in the model can make big differences in the results, especially when you’re playing the statistical significance game. That’s fine—if the trends are noisy, they’re noisy, and your analysis needs to recognize this, and in any case it’s a good idea to explore such data.
I also share Elan’s concern about the whole “robustness check” approach to applied statistics, in which a central analysis is presented and then various alternatives are presented, with the goal is to show the same thing as the main finding (for perturbation-style robustness checks) or to show nothing (for placebo-style robustness checks).
One problem with this mode of operation is that robustness checks themselves have many researcher degrees of freedom, so it’s not clear what we can take from these. Just for example, if you do a perturbation-style robustness check and you find a result in the same direction but not statistically significant (or, as the saying goes, “not quite” statistically significant), you can call it a success because it’s in the right direction and, if anything, it makes you feel even better that the main analysis, which you chose, succeeded. But if you do a placebo-style robustness check and you find a result in the same direction but not statistically significant, you can just call it a zero and claim success in that way.
So I think there’s a problem in that there’s a pressure for researchers to seek, and claim, more certainty and rigor than is typically possible from social science data. If I’d written this paper, I think I would’ve started with various versions of the figures above, explored the data more, then moved to the regression line, but always going back to the connection between model, data, and substantive theories. But that’s not what I see here: in the paper at hand, there’s the more standard pattern of some theory and exploration motivating a model, then statistical significance is taken as tentative proof, to be shored up with robustness studies, then the result is taken as a stylized fact and it’s story time. There’s nothing particularly bad about this particular paper, indeed their general conclusions might well be correct (or not). They’re following the rules of social science research and it’s hard to blame them for that. I don’t see this paper as “junk science” in the way of the himmicanes, air rage, or ages-ending-in-9 papers (I guess that’s why it appeared in JAMA, which is maybe a bit more serious-minded than PPNAS or Lancet); rather, it’s a reasonable bit of data exploration that could be better. I’d say that a recognition that it is data exploration could be a first step to encouraging researchers to think more seriously about how best to explore such data. If they really do have direct data on suicide rates of gay people, that would seem like a good place to look, as Elan suggests.