Skip to content

Struggles over the criticism of the “cannabis users and IQ change” paper

Ole Rogeberg points me to a discussion of a discussion of a paper:

Did pre-release of my [Rogeberg's] PNAS paper on methodological problems with Meier et al’s 2012 paper on cannabis and IQ reduce the chances that it will have its intended effect? In my case, serious methodological issues related to causal inference from non-random observational data became framed as a conflict over conclusions, forcing the original research team to respond rapidly and insufficiently to my concerns, and prompting them to defend their conclusions and original paper in a way that makes a later, more comprehensive reanalysis of their data less likely.

This fits with a recurring theme on this blog: the defensiveness of researchers who don’t want to admit they were wrong. Setting aside cases of outright fraud and plagiarism, I think the worst case remains that of psychologists Neil Anderson and Deniz Ones, who denied any problems even in the presence of a smoking gun of a graph revealing their data error. (Also we’ve gone over the sad story of Ron Unz and David Brooks, but these guys fall in a different category, as their primary occupations are politics and journalism, not research.)

Rogeberg’s story is a bit different than some of the others we’ve discussed, in that he suspects that he (inadvertently) pushed the authors of the original paper (Terrie Moffitt, Avshalom Caspi, and Madeline Meier) to “have painted themselves into a corner psychologically: By defending their original claim and methodology rather than being open to a proper re-examination of the evidence, it has become more difficult for them to do a fair analysis later without losing face if their original effect estimates were exaggerated or turn out to be non-robust.”

Ultimately, it is the fault of these researchers and nobody else for making and defending a scientific error (here I am assuming for the purposes of argument that Rogeberg’s criticisms are valid; I have not gone through and examined the articles under discussion), but I agree that it is also, as Rogeberg puts it “a bit disappointing, as well as sad.”


Rogeberg retells the story in convenient blog-friendly form. It’s a bit long, but this is a statistics blog and I think it’s valuable to have the details:

Basically, the original paper (which is available here) used a simple variant of a difference-in-differences analysis. The researchers sorted people into groups according to whether or not they had used cannabis and according to the number of times they had been scored as dependent. They then compared IQ-changes between age 13 and 38 across these groups, and found that IQ declined more in the groups with heavier cannabis-exposure. The effect seemed to be driven by adolescent-onset smokers, and it seemed to persist after they quit smoking.

The data used for this study was stunning: Participants in the Dunedin Study, a group of roughly 1000 individuals born within 12 months of one another in the city of Dunedin in New Zealand, had been followed from birth to age 38. They had been measured regularly and scored on a number of dimensions through interviews, IQ tests, teacher and parent interviews, blood-samples etc, and are probably amongst the most intensively researched people on the planet: The study website states that roughly 1100 publications have been based on the sample so far, which is more than one publication by participant on average ;)

Despite this impressive data, there were some things I found wanting in the analysis. My own experience with difference in differences methods comes from empirical labor economics, and this experience had led me to expect a number of robustness checks and supporting analyses that this article lacked. This is not surprising: Different disciplines can face similar methodological issues, yet still develop more or less independently of each other. In such situations, however, there will often be good reasons for “cross-pollination” of practices and methods. For instance, experimental economics owes a large debt to psychology, and the use of randomized field trials in development and labor economics owes a large debt to the use of randomized clinical trials in medicine.

The cannabis-and-IQ analysis basically compares average changes in IQ across groups with different cannabis use patterns. Since we haven’t randomized “cannabis use patterns” over the participants, we have an obvious and important selection issue: The traits or circumstances that caused some people to begin smoking pot early, and that caused some of these to become heavily dependent for a long time, can themselves be associated with (or be) variables that also affect the outcome we are interested in. The central assumption, in other words, is that the groups would have had the same IQ-development if their cannabis use had been similar. Since this is the central assumption required for this method to validly identify an effect of cannabis, it is crucial that the researchers provide evidence sufficient to evaluate the appropriateness of this assumption. To be specific, and to show what kind of things I wanted the researchers to provide, you would want to:

  • Establish that the units compared were similar prior to the treatment being studied – e.g., provide a table showing how the different cannabis-exposure groups differed prior to treatment on a number of variables.
  • Establish a common trend – Since the identifying assumption is that the groups would have had the same development if they had had the same “treatment”, then clearly the development prior to the treatments should be similar. In the Dunedin study, they measured IQ at a number of ages, and average IQ changes in various periods could be shown for each group of cannabis users.
  • Control for different sets of possible confounders. To show that the estimates that are of interest are robust, you would want to show estimates for a number of multivariate regressions that control for increasing numbers (and types) of potential confounders. The stability of the estimated effect and their magnitude can then be assessed, and the danger of confounding better evaluated: What happens if you add risk factors that are associated with poor life outcomes (childhood peer rejection, conduct disorders etc), or if you include measures of education, jailtime, unemployment, etc.? If the effect estimate of cannabis on IQ changes a lot, then this suggests that selection issues are important- and that confounders (both known and unknown) must be taken seriously. Adding important confounders will also help estimation of the effect we are interested in: Since they explain variance within each group (as well as some of the variance between the groups), they help reduce standard errors on the estimates of interest.
  • Establish sensitivity of results to methodological choices. Just as we want to know how sensitive our results are to the control variables we add, we also want to know how sensitive they are to the specific methodological choices we have made. In this instance, it would be interesting to allow for pre-existing individual level trends: Assume that people have different linear trends to begin with. To what extent are these differing pre-existing trends shifted in similar ways by later use patterns of cannabis? By adding in earlier IQ-measurements for each individual (which are available from the Dunedin study), such “random growth estimators” would be able to account for any (known or unknown) cause that systematically affected individual trajectories in both pre- and post-treatment periods. Another example is the linear trend variable they use for cannabis exposure, which presumably gives a score of 1 to never users, 2 to users who were never dependent, 3 to those scored as dependent once and so on. This is the variable that they check for significance – and it would be
  • Provide other diagnostic analyses, for instance by considering the variance of the outcome variable within each treatment group (how much did IQ change differ within each treatment group?). In this way, we could tell whether we seemed to be dealing with a very clear, uniform effect that affects most individuals equally, or whether it was a very heterogeneous effect whose average value was largely driven by high-impact subgroups.
  • Discuss alternative mechanisms. What potential mechanisms can be behind this, and what alternative tests can we develop to distinguish between these? For instance, let us say you identify what seems to be a causal effect of cannabis use and dependency, but its magnitude is strongly reduced (but not eliminated) when you add in various potential confounders. For instance, educational level. As the authors of the original paper note (when education turns out to affect the effect size), education could be a mediating factor in the causal process whereby cannabis affects IQ. However, this would mean that the permanent, neurotoxic effect they are most concerned with would be smaller, because part of the measured effect would be due to the effect of cannabis on education multiplied by the effect of this education on IQ. The evidence thus suggests that the direct “neurotoxic” effect is only part of what is going on. It also suggests that we might want to look for evidence to assess how strongly cannabis use causally affects education, to better understand the determinants of this process. For instance, even if there was only a temporary effect of cannabis on cognition, ongoing smoking would do more poorly in school or college, which might then influence later job prospects and long term IQ. The effect doesn’t even have to be through IQ: If pot smoking makes you less ambitious (either because of stoner subculture or psychological effects), the effect may still have long term consequences by altering educational choices and performance. Put differently: If the mechanism is via school, then even transitory effects of cannabis becomes important when they coincide with the period of education.

So that’s the potential statistics problem. Now for the story of what happened next:

When I originally started looking into this last August, I sent an e-mail to the corresponding author asking for a couple of tables with information on “pre-treatment” differences between the exposure groups. I did not receive this. This is quite understandable, given that they were experiencing a media-blitz and most likely had their hands full. I therefore turned to past publications on the Dunedin cohort to see if I could find the relevant information there.

It turned out that I could – to some extent. Early onset cannabis use appeared to be correlated with a number of risk factors, and these risk factors were also correlated with poor life outcomes (low and poor education, crime, income etc.). The risk factors were also correlated with socioeconomic status.

The next question was whether these factors could affect IQ. One recent model of IQ (the Flynn-Dickens model) strongly suggested they would. The model sees IQ as a style or habit of thinking – a mental muscle, if you like – which is influenced by the cognitive demands of your recent environment. School, home environment, jobs and even the smartness of your friends are seen as in a feedback loop with IQ: High initial IQ gives you an interest in (and access) to the environments that in turn support and strengthen IQ. Since the risk factors mentioned above would serve to push you away from such cognitively demanding environments, it seemed plausible that they would affect long term IQ negatively by pushing you into poorer environments than your initial IQ would have suggested.

A couple of further parts to this potential mechanism can be noted (both discussed here): It seems that high-SES kids have a higher heritability of IQ than low-SES kids, which researchers often interpret as due to environmental thresholds: If your environment is sufficiently good, variation in your environment will have small effects on your IQ. If, however, your environment is poorer, similar variation will have larger effects. Put differently: The IQ of low-SES kids is more affected by changes to their environment than that of high-SES kids.

Also, there is a (somewhat counterintuitive, at first glance) result which shows that average IQ heritability increases with age. One interpretation of this is that our genetic disposition causes us to self-select or be sorted into specific environments as we age. The environment we end up with is therefore more determined by our genetic heritage than our childhood environment, where our family and school were, in some sense, “forced environments.”

In my research article, I refer to various empirical studies supporting these mechanisms and their effects. For instance, past studies that find SES, jailtime, and education to be associated with the rate of change in cognitive abilities at different ages. Putting these pieces together, the risk factors that make you more likely to take up pot smoking in adolescence, and that raise your risk of becoming dependent, also shift you into poorer environments than your initial IQ would predict in isolation. Additionally, these shifts are more likely for kids in lower-SES groups (since the risk factors are correlated with SES), and these also have an IQ more sensitive to environmental changes. Finally, for the same reason, the forced environment of schooling is likely to raise childhood IQ more for the low SES kids (because it is a larger improvement on their prior environments, and because their IQs are more sensitive to environmental influences). SES, then, is in some sense a summary variable that is related to a number of the relevant factors, in that low SES

  1. correlates with risk factors that influence, on the one hand, adolescent cannabis use and dependency and, on the other hand, poorer life outcomes, and
  2. signals a heightened sensitivity to environmental factors (the SES-heritability difference in childhood)
  3. probably reflects the magnitude of the extra cognitive demands imposed by school relative to home environment

For these reasons, SES seemed like a good variable to use in a mathematical model to capture these relationships. However, it should be obvious from my description of this mechanism that we should expect the mechanism to work even within a socioeconomic group: Even within this group, those with high levels of risk factors will experience poorer life outcomes, which may reduce their IQs. They will also most likely have higher probabilities of beginning cannabis smoking. At the same time, we would expect a smaller effect within a specific socioeconomic group than we would across the whole population.

However, I simplified this by using SES in three levels and created a mathematical model with these effects, using effect sizes drawn from past research literature where I could find it. Using the methods used in the original study, I tested my simulated data and found the statistical methods identified the same type and magnitude of effects here as they had in the actual study data. This, of course, does not prove or establish that there is no effect of cannabis on IQ. What it does is to show that the methods they used were insufficient to rule out other hypotheses, that the original effect estimates may be overestimated, and that we need to look more deeply into the matter, using the kind of robustness checks and specification tests I discussed above.

And thus:

In my mind [Rogeberg writes], this should be just the normal process of science – an ongoing dialogue between different researchers. We know that replication of results often fail, and that acting on flawed results can have negative consequences (see here for an an interesting popular science account of one such case). A statistical model by medical researcher Ioannides (at the centre of this entertaining profile) suggests that new results based on exploratory epidemiological studies of observational data will be wrong 80% of the time. The Dunedin study on cannabis and IQ would, it seems, fit into this category. After all, by the time you’ve published more than 1100 papers on a group of individuals, it seems relatively safe to say that you have moved into “exploratory” mode.

More from Rogeberg here.

I have nothing to add to all of this. It’s just an interesting story, one more example of the tenaciousness of researchers when subject to criticism.


  1. Anonymous says:

    It’s incredible to me that researchers who are supposed to be professionals at working with observational data are this boneheaded about interpreting it. There seems to be a strain of public health researchers who think that if their study is big and laborious enough, causal inference will magically emerge from their data.

    I’m not sure whether to be ashamed of PNAS for publishing this junk, or praise them for accepting the criticism as an article.

  2. I blame funding environments and tenure committees. Researchers who are honestly having a genuine discussion about the merits of different ideas, and who are extremely careful about controls and methodology and not making statements that haven’t been checked carefully seem to lose out when it comes to funding and tenure compared to overinflated claims, p-value fishing, and staunchly defending controversial and probably untenable ideas. It helps that getting a lot of citations from people trying to shoot you down increases your citation metrics. Also in certain fields, for example biology the progress in the field has been so fast that the people on tenure committees compare the progress of recent assistant professors to their own progress when they were assistant professors, which is completely ridiculous. In 1995 you could publish “We knocked out a gene in a mouse!” now you need a triple conditional knockout plus genomic analysis and a functional assay showing that such and such gene regulates such and such other gene in such and such a tissue to produce such and such an effect. Orders of magnitude more work.

    This blog format with open commentary is in my opinion a far better model for honest academic study. What we need is a sort of standardized interface to the internet archive (, and a wordpress plugin. You put a pdf of your manuscript on your blog, your blog publishes a pdf copy to a special portion of the internet archive where it’s permanently stored and accessible and indexed automatically, the internet archive links to your blog for open anonymous or otherwise commentary on the topic, and a bunch of meta “journals” spring up to collect and recommend research articles from this public archive by topic. To hell with traditional journals and un-accountable anonymous peer review.

    • I know the ArXive exists, but it’s more specialized. We need just a huge repository for all fields, and individual researchers promoting their own work by discussing the formal articles in blog type accompanyment format.

      • Entsophy says:

        I agree completely. The only part of the current system worth saving is the value of science communication as described by Gelman in some previous post. And with modern technology this is basically trivial to achieve.

        Some worry that if there is a free-for-all where anyone can submit a paper, that there will be no way to winnow out the important work. Nothing could be further from the truth. People and organization will arise organically to separate out the wheat from the chaff.

        I saw this first hand in Iraq. There was a civilian intel analyst up north who was famous for putting out a secret email report every week highlighting the significant intel events and explaining in depth their importance. This informal email report was considerably better than the official all-source fusion intel summaries put out by the military in the same area. The guy who wrote it wasn’t getting paid extra for this service, he just did it because he liked too and he did it regularly for years (far longer than typical army unit tour). Famous movie reviewers like Ebert provided the same kind of service and there’s no reason to think the same wouldn’t happen for research papers. This is one area where there’s absolutely no need to fear anarchy.

      • Rahul says:

        +1 for that! The big flaw of the current system is its excessive (perhaps full) reliance on stuffy, slow modes of scholarly back and forth (Short Communications, Letters to Editors, etc.) to resolve issues.

        A quick, short, transparent, perhaps moderated commenting system would make things so much better!

        Even Arxiv, is not so comment friendly, is it?

        • Nick Cox says:

          Optimists for open access to anything anyone wants to write are evidently also optimists for mechanisms emerging that will allow identification of the best work and criticism of weaknesses. That’s fine by me, but don’t kid yourselves that you won’t be reinventing the work of editing and reviewing, albeit under different names and in different forms. Somebody has to read the junk to establish that it is junk.

          Otherwise put, I sense here angry younger people frustrated by some of their early career experiences. No rejoicing at that, but the more you have acted as (a) published author _and_ (b) journal reviewer _and_ (c) journal editor, the more you realize that review processes by traditional journals do also improve weak papers and reject lousy ones that don’t deserve the light of day. It’s not all villainy and tyranny and repression.

          • Rahul says:

            What about a hybrid model? As a start why don’t established journals make commenting and criticism easier?

            Along with free Abstracts why not allow commenting and let people read comments freely. Let Google index it too.

            Any downsides?

            • Nick Cox says:

              My experience, which includes being a journal editor, is that nothing inhibits journal from publishing comments and criticisms so much as the unwillingness of people to submit them. But if peer review is fair for “original” contributions, it applies to comments and criticisms too.

              There are journals based on invited discussion, such as Behavioural and Brain Sciences. Why isn’t that mode of journal more popular?

              • Rahul says:

                Can you cite a (well read) journal that has allowed open, online, anonymous comments and faced this unwillingness you mention?

                Why isn’t traditional-format peer review for comments ideal? You tell me: You were a journal editor; what percent of your articles did you ever publish a follow-up comment or critique on?

          • Entsophy says:

            If we’re publishing exponentially more improved peer reviewed papers all of which convincingly claim some advance then why don’t we have exponential improvements in the sciences? Why for example, does a field like Economics, which is as obsessed with the publication policies of their top journals as any field could be, have about the same predictive ability it had 4 decades ago?

            When most fields are in this kind of a rut then it’s a good idea to introduce some anarchy. Think of it as a kind of “simulated annealing” ( And what better way to introduce some chaos then having a system which is open and the main channel for communication, but completely useless of tenure/funding decisions?

            • Nick Cox says:

              If you are right, you don’t need to dismantle existing systems. You can introduce new systems and the old ones will wither away “organically”. I am less sanguine that order comes out of chaos; I think there is more mysticism than logic in that view.

              • Entsophy says:

                Can you think of any instance where people had to find a few high quality widgets/ideas/content among a mass of dross in which intermediaries didn’t spring up organically to help sort it out? What you’re calling mysticism is actually the norm.

                Since were all empiricists here it’s worth looking at the record of peer review. The thousands of pages written by Newton, the Bernoulli’s, and Euler weren’t peer reviewed. How do their discoveries match up to the last 5000 statistically sound peer reviewed psychology papers?

                Their discoveries were made at a time when communication was far harder, nutrition was poor and resources of all kinds were dramatically less. There was no computers, or air-condition, or easy electric lights to work at night. Moreover, they were working on problems which throughout most of history were considered far harder and less amenable to understanding than the study of the human mind. That we tend to hold the opposite view today is testament to how successful they were. They did all this, and we all learned about it, without any kind of peer review. Somehow just the desire to have their work stand about the dross and be read by likeminded souls was all the corrective influence they needed.

                If that’s the chaos you’re afraid of then I say let’s have more of it. If in the process we destroy “publication” as a way of assigning rewards to scientists then so much the better. At the very least it will free scientists from waste most of their productive lives gaming the system.

          • No doubt traditional review improves weak papers and rejects lousy ones. I think the “rejecting lousy ones” from seeing the light of day is not a feature, it’s a bug. Sure some of them are completely garbage, but others are deemed wrong by people who don’t understand them, or deemed “not interesting enough” for “fancy” journals etc.

            In my model, people would either ignore them as being uninteresting, or attack them as being wrong, but they would *all* be available. Basically I argue that *making available* and *editing and stamping approval* are two orthogonal functions that we need to separate ASAP. We’ve all learned a significant amount of statistical ideas from attacks on wrong methodology in this blog, so it isn’t clear that the journal silent rejection method is good for scholarly correctness.

            Finally we definitely need some way to filter good articles, but as Entsophy points out, that tends to occur organically anyway, and we see that already on the internet in the age of search engines.

    • “It helps that getting a lot of citations from people trying to shoot you down increases your citation metrics.”

      All excellent comments.

    • phayes says:

      “This blog format with open commentary is in my opinion a far better model for honest academic study”

      I like the idea but intellectual ‘dishonesty’ /is/ the problem, IMHO, and the informality of the blog format can be advantageous to those who are prepared to argue lazily, sloppily and fallaciously.

  3. Roger says:

    So is cannabis harmful to health or not?

  4. Rahul says:

    Why are we so hung up over the make researchers “admit they were wrong” part, especially in cases like this where there’s no outright fraud or falsification. Aren’t they arguing over methods; about what should have been controlled and wasn’t?

    If so, we have one person (Rogeberg) who thinks they erred and at least 4 or 5 (original authors, referees) who say they didn’t. Especially confused why Andrew can be sad about a hypothetical i.e. “I am assuming for the purposes of argument that Rogeberg’s criticisms are valid; I have not gone through and examined the articles” Is there reason to believe Rogeberg more than the original authors?

    I feel the best response is a counter-publication. If the same journal refuses then in other journals. And if that’s not happening, then aren’t the journals and referees too more at fault? Why not let third parties debate out the goodness of a paper; why do we attach so much importance to hearing the authors admit they were wrong?

    • Anonymous says:

      It’s not a matter of multiple legitimate methods leading to multiple plausible interpretations. It is simply incorrect to state that the claim they make should follow from the methods that they used.

      Their error isn’t about the causal effect of cannabis, their error was regarding whether they were entitled to make a claim regarding the causal effect of cannabis based on what they have done. It’s simply wrong to think that they could make any such causal claim with the data and analysis that they have.

    • Michael says:

      I think that the issue is not so much being wrong, but being avoidably wrong and then defensive. The idea that a correlation arising in an observational data should be scrutinized by accounting for plausible confounders before being considered for publication is not too much to expect. This is not failure to apply some particular cutting edge method – this is a well-established basic principle. It was clearly articulated as early as 1921, but I am sure has been well-understood in various contexts since before then. To be defensive about somebody who is curious about whether results, and associated causal inference, are changed if a plausible plausible alternative hypothesis was investigated…that’s worth getting frustrated over.

      The idea that it has been reviewed and is therefore credible, is dangerous. There is every possibility that this is not the first journal to which this work was submitted. It could be that multiple reviewers have already pointed out flaws. But more importantly, and as above, a plausible alternative hypothesis should hold more weight.

      • Rahul says:

        No, my point was not that ” reviewed and is therefore credible”.

        I’m saying tear it apart, but if so obviously wrong, why not pillory the reviewers too. It doesn’t matter if this is the 10th Journal they submitted to. If reviewers let a obviously bad article pass they share blame.

        If multiple reviewers pointed out fatal flaws, how desperate were the editors to publish it notwithstanding.

        • Anonymous says:

          The reviewers should be blamed. One problem is that certain fields have poor standards of evidence, and in public health, researchers are easily impressed by large studies and multi-modal measurements. Researchers in this field tend to be more impressed by large, poorly identified studies than controlled, well-identified ones and often invoke the “real world” terminology quoted in Rogeberg’s blog post – “whether X actually is Y in the real world and how much”. To them, explanatory factors aren’t controlled in the “real world”, so these types of observational studies are considered more “practical”.

          • Rahul says:

            I wonder: Would there be a downside in publishing with each accepted paper the names of who reviewed and OKayed it?

            I can see why reviewers might want to remain anonymous post a scathing review or rejection but if they accepted a paper is there a good reason to keep them veiled?

            On the plus side, having to stand by their approval publicly might incentivize a little more scrupulous reviewing?

            • Nick Cox says:

              I have often seen, in the Earth sciences,

              1. Names of reviewers published with a published paper.

              2. Reviewers declaring themselves to the author despite a default journal convention of anonymous reviewing.

        • Anonymous says:

          (same anon)
          the problem is that PNAS will send this paper to other researchers within the field. If they had sent it to the right “non-experts”, these flaws would have been obvious. However, there are batch effects where entire fields are blind to systemic problems with how evidence of causality is evaluated – public health, macroeconomics, genomics…

          • Andrew says:

            What really upset me were the examples of economists Oster and Ashraf/Galor who published seriously flawed studies and then, when called on it, lashed out at their critics (public health researchers in one case, anthropologists in the other), secure in their belief that, as economists, they could not possibly learn anything useful from people in a different scholarly field.

    • Ole Rogeberg says:

      Hi Rahul,
      I agree that the best response is a counter-publication, but that is what happened: I did submit my response to the same journal, it went through the same review process as an ordinary PNAS article, was revised twice to take into account responses from the referees, and was published. The original authors replied in a letter to the editor (gated, but here: ) and I was given a 500 word reply (ungated, available here: ).


  5. walker says:

    (from a naive view) how has Rogeberg’s analysis advanced knowledge at all other than now knowing that SES could indeed be a confounder? It helps to build a plausible story. But at what cost? Does a plausible story make it any more likely that this is part of the true story. Does it make it any less likely that some other story is the true story? We can find confounders all day but are we any closer to answering the question (does cannabis attenuate IQ growth)?

    So, these lead to my real question that I am currently struggling with. Is it even worth looking for a correlation between cannabis use and IQ (that is, doing the study)? If we do not find one, this is not evidence that cannabis is not causal. if we do find one, this is no evidence that it is causal. If the study is worth the time and money, why? What have we learned? What is the proper way to present the results?

    • K? O'Rourke says:

      > If the study is worth the time and money, why?
      Excellent question which may now be is a re-analyis, another study and a meta-analysis worth it.

      There simply are not _better_ sources of evidence and its hard to rule out an extreme and convincing result (although that rarely happens).

      > What have we learned?
      No one (other than perhaps the authors) can discern this.

      > What is the proper way to present the results?
      An adequately documented (anonomised) data set with analysis scripts documenting the purpose of each step in the analysis.

    • Roger says:

      My guess is that any study claiming a negative effect for cannabis would be criticized for not ruling out all possible confounders.

    • Ole Rogeberg says:

      I did try to go beyond the “x may be a confounder” type criticism by also discussing a set of new analyses that the authors could use on their data to better distinguish between the two interpretations of the correlations. This was not followed up in the reply from the authors (here, possibly gated: ), which I pointed out in my reply (ungated, here:

  6. K? O'Rourke says:

    > tenaciousness of researchers when subject to criticism.

    Unfortunately, it likely is just human nature and we all fall into when we feel we are being undoubtedly good.

    The below is an email from a evidence based group that likely would define _open to criticism_ as their first and foremost quality. In fact, the email was sent by Georgia Salanti who seems to be described just like that in the profile link given in the post

    I was suprised at the time but probably should not have been.

    Dear Dr O’Rourke,

    We have taken the decision to suspend you from the Cochrane Statistical Methods Group discussion list, SMGlist.

    We have previously warned you about sending confrontational emails to the list. Your most recent posts to the list about noninferiority trials were not politely worded, and we have received several off-list adverse comments from long-standing list members. We are concerned that the tone of such exchanges detracts from the usual collaborative spirit and may deter younger list members from contributing and may lead to resignations from the email list. As you have not modified the tone of your posts since previous warning, we feel we had no option but to suspend you from the list.

    The SMG co-convenors,
    Doug Altman, Joseph Beyene, Jo McKenzei, Steff Lewis and Georgia Salanti

    • Rahul says:

      Can you also post your original mail that prompted the ban? It’s only fair so that we can judge if they reacted fairly or not.

  7. K? O'Rourke says:

    Yes, but I should only provide what I sent (not what I was responding to or comments back to me)

    Subject: Re: [smglist] Meta analysis of non inferiority trials

    With all due respect, I believe I need to be very direct if not impolite about this.

    I think this is ludicrous – “no reason to view results from noninferiority trials differently”

    In noninferiority trials (as in all indirect comparisons) Rx versus Placebo estimates require one to make up data (exactly as it occured in earlier historical Placebo controlled trials) but rather than being explicite about that making up of the data (e.g. calling it an imformative prior) vague assumptions are stated that would justify the making up as essentially risk free. Unfortunately there is no good way to check those assumptions and they are horribly non-robust.

    Exactly how much less worse noninferiority studies are than observational studies is a good question – the earlier multiple bias analysis literature, especially R Wolpert did address this but it seems to have been forgotten. Also Stephen Senn has written a somewhat humorous paper for drug regulators (with huge historical trials the made up SE will almost be zero)

    - hopefully they wont miss the point.



  8. Nick Cox says:

    Rahul asks me [my insertions are in square brackets]

    “Can you cite a (well read) journal that has [a] allowed open, online, anonymous comments and [b] faced this unwillingness you mention?”

    I don’t know any [a] so [b] is not something I can answer. Not your question, but I don’t also see much point to “anonymous comments” of this kind. Elsewhere in this thread you want reviewers to be named and now you want to allow commenters to be anonymous….

    “Why isn’t traditional-format peer review for comments ideal? You tell me: You were a journal editor; what percent of your articles did you ever publish a follow-up comment or critique on?”

    Not “were”, “am”. Very few, because we typically don’t get submissions of that type.

    • Rahul says:

      “you want reviewers to be named and now you want to allow commenters to be anonymous”

      Sorry, I do not see the contradiction here. There’s utility to having this dichotomy. In general, don’t we expect there to be a power gradient (or reputational, seniority, career-security ,expertise etc.) between a commentator and a reviewer?

      Do we require whistle-blowers to reveal identities? Do we require sugesstion boxes / complaint boxes to throw away anonymous notes? Do tip-lines not allow anonymous calls?

      Note, I’m not saying comments have to be un-moderated. Moderation can weed out most of the spam / abuse / ad-hominems etc.

      Why the persistent antipathy to anonymity?

      • Nick Cox says:

        I don’t know why you infer that. For example, perhaps you missed my earlier post commenting on disclosure of names in the Earth sciences as a common practice, and I will spell out my implication: there are many positives to the idea.

        I am asking _you_ why you think it’s consistent to want named authors, named reviewers and anonymous commentators. Named authors, named reviewers, named commentators by default might be a very good basic system. If people insist on being anonymous or pseudonymous, then so be it, although journals could take very different lines on that. I doubt I would respect any journal in which most commentators concealed their names. I’d expect sharp commentators mostly to want the credit for cogent comments.

        I am focusing on journal publication. Feel free to broaden the discussion to any kind of commentary if you like; I don’t feel obliged to follow you. I don’t think (e.g) whistle-blowing where there might be serious personal risk is an exact analogue, although it’s always possible even for academic work (e.g. criticising a boss or colleague, etc.).

        I am glad that you think moderation is likely to be necessary. That’s been my main point in this thread. Peer review may look like a poor system, but those wanting to replace it with something else will find themselves re-creating much of it in different ways. Elsewhere I participate in web forums whose mechanisms for editing, up- and down-voting, removing posts, sanctioning posters of low quality, etc., etc. exceed in complexity any journal I know about by about two orders of magnitude. Also, that’s a lot of work!

  9. Nick Cox says:

    Entsophy asks

    “Since were [sic] all empiricists here it’s worth looking at the record of peer review. The thousands of pages written by Newton, the Bernoulli’s, and Euler weren’t peer reviewed. How do their discoveries match up to the last 5000 statistically sound peer reviewed psychology papers?”

    This is one of the most bizarre, loaded and unhistorical comparisons I have ever read. Sure, absence of peer review did not stop geniuses doing great work. Whoever claimed otherwise? Much else — in fact almost everything — that was also published in the 17th and 18th centuries was trivial, mediocre or wrong by many standards, and at best some historians know about some of it.

    Conversely, whoever claims that a typical sample of modern peer-reviewed publications from any discipline (I hold no animus against psychology — why single that out) averages genius level? That would be absurd, and the failure to reach that level is not an indictment of peer review.

    In abstraction, like many other people, I readily agree that there are too many papers and too many conventional journals. So, the solution is total free-for-all via the internet?

    Academic publishing is in transition and radically new forms are likely to emerge. I am just saying: Don’t demonise the present system as all bad and fantasise about pie in the sky replacements that are going to be all good.

    Finally, Entsophy writes that we need to

    “free scientists from waste [sic] most of their productive lives gaming the system”.

    I feel sad for him that he has such a starkly negative view. What publication experiences led him to such negativity?

    • Entsophy says:

      “published in the 17th and 18th centuries was trivial, mediocre or wrong by many standards”

      Exactly right. There was far more of this stuff back then than anyone who’s never seen a journal from that time would ever guess. It was exactly my point that Newton et. al. had no trouble being seen for what they were and rising to top all without peer review. On the other hand, when Royal Societies and the like did become powerful and did have something like peer review, they amassed quite a history of squelching correct and innovative work from outsiders. Galois in France and Waterston in England are but two examples from the 1800′s.

      “… is not an indictment of peer review.”

      It’s an indictment of using journals for tenure/status/funding. As long as you have peer reviewed journals they will be used thus. The journal system has been gamed so much in most fields like Economics that it’s openly admitted few ever read the journals and most real communication occurs through back channels and things like working papers. The point wasn’t to get rid of peer review, it’s to return to using publications solely for communication.

      “the solution is total free-for-all via the internet? “

      Yes, because it gives what I’ll call “superstars of peer review” a chance to do their stuff. In my original post I mentioned an intelligence analyst who was doing that function better than official channels. The French monk Mersenne did exactly the same thing for the mathematical sciences in the 1600′s. Who could do that now? I’m not sure, but it’s unlikely to be someone you’d pick to be editor of a major journal. Then again maybe it would be. Maybe you for example could make a lot of money being a superstar of peer review (a “Mersenne” for the 21st century) the same way some professors are going to make millions off MOOCs.

      “What publication experiences led him to such negativity?”

      Well I have a pretty wide background in academia, so here’s what I saw first hand:

      I saw Topologists randomly change “finite” to “countable” in some worthless math definition because they knew doing so would automatically make all their trivial theorems new and hence publishable.

      I saw Experimental Physicists do boring research they knew was going no where because the research was guaranteed to get a publishable paper which would lead to a new grant which they hoped would lead to real research.

      I saw Economic graduate students being told they had to find a new gimmicky data set that had nothing to do with economics because analyzing it with some trivial statistics was the easiest way to make a name for themselves.

      I saw Statisticians being told to publish their half digested work immediately and not to wait until it had matured and had been simplified though deeper insight. That way they could get a steady string of papers out making corrections and additions to the original paper.

      I’ve seen Bayesian statisticians use standard frequentest hypothesis tests for biostatistics related stuff instead of the methods they wanted to use, because they new it would get through peer review easier.

      I could go on and on. But there are two things I’ve haven’t seen much of: (1) I haven’t seen researchers work in the old style the way Euler did, and (2) I haven’t seen researchers get results like Euler did.

  10. John says:

    Of course, it could just be that the exponential growth in scientific publications reveals a growth in this kind of researcher and response from a researcher. I have worked with many many people who are more than willing to admit their mistakes. They’s usually very good researchers.

    Would any of the proposed solutions really eliminate the staunchly obstinate newsy researcher?

  11. jrc says:

    I figured someone on here ought to give a “yay academics” story on this thread, not because I think all academics are honest intellectuals, but just to point out that some are. I’ve been working for a while now on a paper that is essentially a “people have been doing this wrong” paper. Almost everyone I’ve talked to, including people who have made the mistake, have been incredibly kind to me and responded with encouragement and support.

    In particular, I met one researcher at a conference, we chatted for a while, and after pointing out to them the problem, I asked if they would be willing to run my new specifications on their data. The next week, we spent an evening emailing code and results back and forth, and I ended up learning something new that improved my paper and they got new insights into their results. Now we’re quite friendly, email occasionally, and find some time to chat when we see each other at conferences. This person (who is quite senior to me and a growing name in the field) could easily have 1) refused to answer my initial email before the conference; 2) not spent an hour of their time chatting with me after their talk; 3) not engaged me in a back/forth that could expose a fundamental flaw in their work; or 4) started bad-mouthing me as a troublemaker. Instead, they chose to engage with me, we both learned a lot, and I now consider them a friend.

    So it’s not all terrible out there. Sure, I’ve had one or two people be a bit defensive in emails, but really it’s been a positive experience for me. It’s also greatly improved my work, which shifted from being essentially a ‘take-down’ paper, to a methodology paper that speaks to a much broader literature.

    One last point: In my field (applied microeconomics), the “peer review” process is not just a publication thing, but starts with seminars and conference presentations, where tough questions are asked and solutions/fixes/alternatives suggested. Every paper that’s been published has gone through a face-to-face grilling by smart people, usually many times. Maybe that makes us a bit more resilient to negative feedback and more open to criticism.

    • Andrew says:


      I agree that econ, as a field, seems much more serious about peer-reviewing even before a paper is submitted to a journal. Even there, serious mistakes can slip through (we’ve discussed the cases of Oster, Ashraf and Galor, and Reinhart and Rogoff. Particularly scary is when economists receive valuable negative feedback from researchers in other fields and just dismiss it. But no system is perfect.

      • jrc says:


        I completely agree. I don’t think any field is immune to the problem, and I would never hold up economists as paradigms of humble empiricism. There are good and bad researchers in any field. And applied mirco researchers have the added danger that our discipline currently sees itself as almost limitless in it’s empirical domain, so we end up walking into some very stupid walls. Interestingly, this methods paper started with me running into a stupid wall on another discipline’s turf. I just got lucky because I work closely with people in the associated fields and realized my (massive) mistake before it was too late. Turns out, I wasn’t the only one who had made it, hence the paper.