Ole Rogeberg points me to a discussion of a discussion of a paper:
Did pre-release of my [Rogeberg's] PNAS paper on methodological problems with Meier et al’s 2012 paper on cannabis and IQ reduce the chances that it will have its intended effect? In my case, serious methodological issues related to causal inference from non-random observational data became framed as a conflict over conclusions, forcing the original research team to respond rapidly and insufficiently to my concerns, and prompting them to defend their conclusions and original paper in a way that makes a later, more comprehensive reanalysis of their data less likely.
This fits with a recurring theme on this blog: the defensiveness of researchers who don’t want to admit they were wrong. Setting aside cases of outright fraud and plagiarism, I think the worst case remains that of psychologists Neil Anderson and Deniz Ones, who denied any problems even in the presence of a smoking gun of a graph revealing their data error. (Also we’ve gone over the sad story of Ron Unz and David Brooks, but these guys fall in a different category, as their primary occupations are politics and journalism, not research.)
Rogeberg’s story is a bit different than some of the others we’ve discussed, in that he suspects that he (inadvertently) pushed the authors of the original paper (Terrie Moffitt, Avshalom Caspi, and Madeline Meier) to “have painted themselves into a corner psychologically: By defending their original claim and methodology rather than being open to a proper re-examination of the evidence, it has become more difficult for them to do a fair analysis later without losing face if their original effect estimates were exaggerated or turn out to be non-robust.”
Ultimately, it is the fault of these researchers and nobody else for making and defending a scientific error (here I am assuming for the purposes of argument that Rogeberg’s criticisms are valid; I have not gone through and examined the articles under discussion), but I agree that it is also, as Rogeberg puts it “a bit disappointing, as well as sad.”
Rogeberg retells the story in convenient blog-friendly form. It’s a bit long, but this is a statistics blog and I think it’s valuable to have the details:
Basically, the original paper (which is available here) used a simple variant of a difference-in-differences analysis. The researchers sorted people into groups according to whether or not they had used cannabis and according to the number of times they had been scored as dependent. They then compared IQ-changes between age 13 and 38 across these groups, and found that IQ declined more in the groups with heavier cannabis-exposure. The effect seemed to be driven by adolescent-onset smokers, and it seemed to persist after they quit smoking.
The data used for this study was stunning: Participants in the Dunedin Study, a group of roughly 1000 individuals born within 12 months of one another in the city of Dunedin in New Zealand, had been followed from birth to age 38. They had been measured regularly and scored on a number of dimensions through interviews, IQ tests, teacher and parent interviews, blood-samples etc, and are probably amongst the most intensively researched people on the planet: The study website states that roughly 1100 publications have been based on the sample so far, which is more than one publication by participant on average ;)
Despite this impressive data, there were some things I found wanting in the analysis. My own experience with difference in differences methods comes from empirical labor economics, and this experience had led me to expect a number of robustness checks and supporting analyses that this article lacked. This is not surprising: Different disciplines can face similar methodological issues, yet still develop more or less independently of each other. In such situations, however, there will often be good reasons for “cross-pollination” of practices and methods. For instance, experimental economics owes a large debt to psychology, and the use of randomized field trials in development and labor economics owes a large debt to the use of randomized clinical trials in medicine.
The cannabis-and-IQ analysis basically compares average changes in IQ across groups with different cannabis use patterns. Since we haven’t randomized “cannabis use patterns” over the participants, we have an obvious and important selection issue: The traits or circumstances that caused some people to begin smoking pot early, and that caused some of these to become heavily dependent for a long time, can themselves be associated with (or be) variables that also affect the outcome we are interested in. The central assumption, in other words, is that the groups would have had the same IQ-development if their cannabis use had been similar. Since this is the central assumption required for this method to validly identify an effect of cannabis, it is crucial that the researchers provide evidence sufficient to evaluate the appropriateness of this assumption. To be specific, and to show what kind of things I wanted the researchers to provide, you would want to:
- Establish that the units compared were similar prior to the treatment being studied – e.g., provide a table showing how the different cannabis-exposure groups differed prior to treatment on a number of variables.
- Establish a common trend – Since the identifying assumption is that the groups would have had the same development if they had had the same “treatment”, then clearly the development prior to the treatments should be similar. In the Dunedin study, they measured IQ at a number of ages, and average IQ changes in various periods could be shown for each group of cannabis users.
- Control for different sets of possible confounders. To show that the estimates that are of interest are robust, you would want to show estimates for a number of multivariate regressions that control for increasing numbers (and types) of potential confounders. The stability of the estimated effect and their magnitude can then be assessed, and the danger of confounding better evaluated: What happens if you add risk factors that are associated with poor life outcomes (childhood peer rejection, conduct disorders etc), or if you include measures of education, jailtime, unemployment, etc.? If the effect estimate of cannabis on IQ changes a lot, then this suggests that selection issues are important- and that confounders (both known and unknown) must be taken seriously. Adding important confounders will also help estimation of the effect we are interested in: Since they explain variance within each group (as well as some of the variance between the groups), they help reduce standard errors on the estimates of interest.
- Establish sensitivity of results to methodological choices. Just as we want to know how sensitive our results are to the control variables we add, we also want to know how sensitive they are to the specific methodological choices we have made. In this instance, it would be interesting to allow for pre-existing individual level trends: Assume that people have different linear trends to begin with. To what extent are these differing pre-existing trends shifted in similar ways by later use patterns of cannabis? By adding in earlier IQ-measurements for each individual (which are available from the Dunedin study), such “random growth estimators” would be able to account for any (known or unknown) cause that systematically affected individual trajectories in both pre- and post-treatment periods. Another example is the linear trend variable they use for cannabis exposure, which presumably gives a score of 1 to never users, 2 to users who were never dependent, 3 to those scored as dependent once and so on. This is the variable that they check for significance – and it would be
- Provide other diagnostic analyses, for instance by considering the variance of the outcome variable within each treatment group (how much did IQ change differ within each treatment group?). In this way, we could tell whether we seemed to be dealing with a very clear, uniform effect that affects most individuals equally, or whether it was a very heterogeneous effect whose average value was largely driven by high-impact subgroups.
- Discuss alternative mechanisms. What potential mechanisms can be behind this, and what alternative tests can we develop to distinguish between these? For instance, let us say you identify what seems to be a causal effect of cannabis use and dependency, but its magnitude is strongly reduced (but not eliminated) when you add in various potential confounders. For instance, educational level. As the authors of the original paper note (when education turns out to affect the effect size), education could be a mediating factor in the causal process whereby cannabis affects IQ. However, this would mean that the permanent, neurotoxic effect they are most concerned with would be smaller, because part of the measured effect would be due to the effect of cannabis on education multiplied by the effect of this education on IQ. The evidence thus suggests that the direct “neurotoxic” effect is only part of what is going on. It also suggests that we might want to look for evidence to assess how strongly cannabis use causally affects education, to better understand the determinants of this process. For instance, even if there was only a temporary effect of cannabis on cognition, ongoing smoking would do more poorly in school or college, which might then influence later job prospects and long term IQ. The effect doesn’t even have to be through IQ: If pot smoking makes you less ambitious (either because of stoner subculture or psychological effects), the effect may still have long term consequences by altering educational choices and performance. Put differently: If the mechanism is via school, then even transitory effects of cannabis becomes important when they coincide with the period of education.
So that’s the potential statistics problem. Now for the story of what happened next:
When I originally started looking into this last August, I sent an e-mail to the corresponding author asking for a couple of tables with information on “pre-treatment” differences between the exposure groups. I did not receive this. This is quite understandable, given that they were experiencing a media-blitz and most likely had their hands full. I therefore turned to past publications on the Dunedin cohort to see if I could find the relevant information there.
It turned out that I could – to some extent. Early onset cannabis use appeared to be correlated with a number of risk factors, and these risk factors were also correlated with poor life outcomes (low and poor education, crime, income etc.). The risk factors were also correlated with socioeconomic status.
The next question was whether these factors could affect IQ. One recent model of IQ (the Flynn-Dickens model) strongly suggested they would. The model sees IQ as a style or habit of thinking – a mental muscle, if you like – which is influenced by the cognitive demands of your recent environment. School, home environment, jobs and even the smartness of your friends are seen as in a feedback loop with IQ: High initial IQ gives you an interest in (and access) to the environments that in turn support and strengthen IQ. Since the risk factors mentioned above would serve to push you away from such cognitively demanding environments, it seemed plausible that they would affect long term IQ negatively by pushing you into poorer environments than your initial IQ would have suggested.
A couple of further parts to this potential mechanism can be noted (both discussed here): It seems that high-SES kids have a higher heritability of IQ than low-SES kids, which researchers often interpret as due to environmental thresholds: If your environment is sufficiently good, variation in your environment will have small effects on your IQ. If, however, your environment is poorer, similar variation will have larger effects. Put differently: The IQ of low-SES kids is more affected by changes to their environment than that of high-SES kids.
Also, there is a (somewhat counterintuitive, at first glance) result which shows that average IQ heritability increases with age. One interpretation of this is that our genetic disposition causes us to self-select or be sorted into specific environments as we age. The environment we end up with is therefore more determined by our genetic heritage than our childhood environment, where our family and school were, in some sense, “forced environments.”
In my research article, I refer to various empirical studies supporting these mechanisms and their effects. For instance, past studies that find SES, jailtime, and education to be associated with the rate of change in cognitive abilities at different ages. Putting these pieces together, the risk factors that make you more likely to take up pot smoking in adolescence, and that raise your risk of becoming dependent, also shift you into poorer environments than your initial IQ would predict in isolation. Additionally, these shifts are more likely for kids in lower-SES groups (since the risk factors are correlated with SES), and these also have an IQ more sensitive to environmental changes. Finally, for the same reason, the forced environment of schooling is likely to raise childhood IQ more for the low SES kids (because it is a larger improvement on their prior environments, and because their IQs are more sensitive to environmental influences). SES, then, is in some sense a summary variable that is related to a number of the relevant factors, in that low SES
- correlates with risk factors that influence, on the one hand, adolescent cannabis use and dependency and, on the other hand, poorer life outcomes, and
- signals a heightened sensitivity to environmental factors (the SES-heritability difference in childhood)
- probably reflects the magnitude of the extra cognitive demands imposed by school relative to home environment
For these reasons, SES seemed like a good variable to use in a mathematical model to capture these relationships. However, it should be obvious from my description of this mechanism that we should expect the mechanism to work even within a socioeconomic group: Even within this group, those with high levels of risk factors will experience poorer life outcomes, which may reduce their IQs. They will also most likely have higher probabilities of beginning cannabis smoking. At the same time, we would expect a smaller effect within a specific socioeconomic group than we would across the whole population.
However, I simplified this by using SES in three levels and created a mathematical model with these effects, using effect sizes drawn from past research literature where I could find it. Using the methods used in the original study, I tested my simulated data and found the statistical methods identified the same type and magnitude of effects here as they had in the actual study data. This, of course, does not prove or establish that there is no effect of cannabis on IQ. What it does is to show that the methods they used were insufficient to rule out other hypotheses, that the original effect estimates may be overestimated, and that we need to look more deeply into the matter, using the kind of robustness checks and specification tests I discussed above.
In my mind [Rogeberg writes], this should be just the normal process of science – an ongoing dialogue between different researchers. We know that replication of results often fail, and that acting on flawed results can have negative consequences (see here for an an interesting popular science account of one such case). A statistical model by medical researcher Ioannides (at the centre of this entertaining profile) suggests that new results based on exploratory epidemiological studies of observational data will be wrong 80% of the time. The Dunedin study on cannabis and IQ would, it seems, fit into this category. After all, by the time you’ve published more than 1100 papers on a group of individuals, it seems relatively safe to say that you have moved into “exploratory” mode.
More from Rogeberg here.
I have nothing to add to all of this. It’s just an interesting story, one more example of the tenaciousness of researchers when subject to criticism.