Hurricanes/himmicanes extra: Again with the problematic nature of the scientific publication process

Jeremy Freese has the story.

To me, the sad thing is not that people who don’t understand statistics are doing research. After all, statistics is hard, and to require statistical understanding of all quantitative researchers would be impossible to enforce in any case. Indeed, if anything, one of the goals of the statistical profession is to develop tools such as regression analysis that can be used well even by people who don’t know what they are doing.

And the sad thing is not that the Proceedings of the National Academy of Science publishes weak work. After all, there’s a reason that PNAS is known as one of “the tabloids.”

To me, the sad thing is that certain researchers, after getting well-informed and open criticism, respond with defensive foolishness rather than recognizing that maybe—just maybe—some other people know more about statistics than they do.

Incompetence is not so bad—all of us are incompetent at times—but, as scientists, we should try to recognize the limitations of our competence.

It’s too bad because the himmicanes people always had an easy out: they could just say that their claims are purely speculative, that they’re drawing attention to a potentially important issue, that yes they did make some statistics errors but that’s not surprising given that they’re not statistics experts, and that they recommend that others follow up on this very important work. They could respond to the points of Freese and others that their claims are not convincing given huge variation and small sample size, by agreeing, and saying that they regret that their work was so highly publicized but that they think the topic is important. Their defensiveness is completely inappropriate.

I blame society

To return to one of our general themes of the past year or so, I think a key problem here is the discrete nature of the scientific publication process. The authors of the himmicanes/hurricanes paper appear to have the attitude, so natural among those of us who do science for a living, that peer-reviewed publication is a plateau or resting place: the idea is that acceptance in a journal—especially a highly selective journal such as PNAS—is difficult and is a major accomplishment, and that should be enough. Hence the irritation at carpers such as Freese who won’t let well enough alone. From the authors’ point of view, they’ve already done the work and had it approved, and it seems a bit like double jeopardy to get their work criticized in this way.

I understand this attitude, but I don’t have to like it.

P.S. Above image from classic Repo Man clip here.

39 thoughts on “Hurricanes/himmicanes extra: Again with the problematic nature of the scientific publication process

  1. I don’t agree that the hurricane paper has an out just for the main conclusion spread across the Internet by journalists. Proper peer review really shouldn’t overlook the terrible p-hacking practices that were also undertaken in their study.

    Honestly, a quite tenable position is that anyone in any field who presents data having made arbitrary choices between subject z-scores and raw values on Likert scales without justification (small n’s and other problems merely icing on top of that cake) should be permanently fired from their job, tenured or not, and a fair number of math or statistics postdocs should be recruited and given a chance for their former position. That’s the message that should get out. The controversial position is that editors who let such papers slide through review in the social sciences also need to be in line for that chopping block.

    • I hope this was a trolling effort, because your idea about replacing everyone with statisticians is the single dumbest thing I’ve read in a long time. Society would be far worse off with statisticians who are utterly lacking in a proper content background suddenly thrown into an area of applied research. After all, who wouldn’t want to participate in medical research being conducted by postdoc mathematicians and statisticians? Everyone knows mathematicians and statisticians are so much smarter than everyone else that they can spend just a few weeks developing the same depth of content expertise that others spend years on.

      On a related note, I think academic statisticians should be required to do applied research. I also think they should be required to disseminate their research to other relevant fields by publishing in their journals or presenting at their conferences. Not all of their research would have to be done this way, but at least some of it would. Too many statisticians play it safe by publishing only where other statisticians are their main audience. They do little to nothing to elevate other fields, relying instead on the questionable reasoning that other statisticians will learn from their work and that those statisticians will disseminate instead. Or they simply choose to poke at others’ research while contributing little if any research of their own, lest they suddenly find themselves in a glass house that might not be conducive to the stones they like to throw.

      • That is a dumb statement, if only he said it.

        “Indeed, if anything, one of the goals of the statistical profession is to develop tools such as regression analysis that can be used well even by people who don’t know what they are doing.”

        He never said that we should replace everyone with statisticians. He wants people to hone up to their mistakes, so that we as a community can help each other conduct better science.

  2. After all, there’s a reason that PNAS is known as one of “the tabloids.”

    Is the PNAS==Tabloid association commonly held? I heard it for the first time on this blog. In my discipline PNAS is very highly regarded.

    All the PNAS+Tabloid hits on Google seem to come back to Andrew too. Is there enough usage to employ the “is known as” phrase here?

    Or is it just “Personally I think of PNAS as a tabloid, but that’s just me”

    • I think a sentiment to that effect, although not often put in so many words, is widely held in the social sciences, because the quality of social science papers in PNAS/Science/Nature is not – generally – par with those in the respective disciplines’ own top outlets.

    • PNAS, Science, and Nature are all known to publish provocative papers that are not quantitatively rigorous. There are countless examples where the implications of the research are given priority over the amount/quality of evidence provided by the analysis and post-publication review tears the paper apart. As folks have pointed out before, part of this is the publicity that articles in these journals get (press releases, etc.) which increases the scrutiny and “exposure rate” of the poor science. I agree with Andrew that peer review acceptance is a poor indicator of article quality.

      Very frustrating when someone is confronted with a damaging fact and labels it simply as a differing opinion. That’s the American way though…

      • Peer review may indeed be a poor indicator of article quality (I don’t know) but if so, do you have any alternatives as better indicators of article quality?

        PNAS, Science, and Nature may be publishing flawed papers, but are they any worse than others? Do we have evidence that a set of journals are much better in this respect than PNAS, Science, and Nature?

        I just think you notice crap more when Nature etc. carries it. When some Unknown Journal publishes the same crap it’s just less likely to attract attention.

        I see no evidence that the referees or review process at PNAS / Nature / Science is any more lax than at other journals in the field.

        • Rahul:

          I assume that peer review supplies a positive signal. But it’s a weak signal, much weaker than what we can learn from looking at a paper carefully and considering its claims in light of outside information. The problem is when people use peer review not as a weak signal but as a definitive cutpoint or as a defense for weak work.

        • Andrew:

          Fair enough. But that impugns peer review in itself and not PNAS / Nature / Science.

          I think the “tabloid” criticism is unfair. PNAS / Nature / Science do as crappy a peer review as most other journals.

          The tabloid critique makes PNAS / Nature / Science seem especially bad which supposes there exists a cohort of much higher quality journals. Which I don’t think is true.

        • Rahul:

          I think that, when it comes to social science, PNAS/Nature/Science are more tabloidy than the the leading journals in the subject fields (for example, American Political Science Review). But my impression is that a poli sci article in PNAS/Nature/Science will get more attention than an APSR article. So I think “tabloid” is fair.

        • Andrew:

          Do you have any hard / quantitative evidence for the assertion that “PNAS/Nature/Science are more tabloidy than the the leading journals in the subject fields”? Or is it just intuition / feel / anecdotal experience?

          PS. How exactly do you define “tabloidy”? You seem to be using it in a somewhat circular sense.

        • Yes Rahul, I clearly stated that these journals get more publicity and thus the exposure rate of crappy papers is higher than in lower journals. But the high-quality technical journals (at least in ecology, with which I’m familiar) are less likely to make provocative claims and more likely to be statistically sound. I think scientists can at times be blinded by the publicity potential for an impressive finding before confirming the robustness of their conclusions. This is the problem with PNAS, Science, and Nature – they can lend themselves to chasing a story at the expense of getting the story right. Plenty of good scientists can do both but the incentives drive others to cut corners.

          And you can spend quite a bit of time on Google reading tirades about peer review, with some discussion of potential solutions. The problems are well known. I think Andrew’s point is that publication of a paper simply means it went through a potentially flawed process and came out the other side successfully. You might be able to say that most of the time this process works well enough. But certainly not all of the time. So it’s lame for authors to field scrutiny with weak and stubborn defenses as if the paper has become untouchable once printed.

    • PNAS is subject to log rolling by national academy members who can take papers through an expedited review, where an academy member acts as the editor for the papers they choose (they also have a normal review channel).

      Needless to say there have been abuses. There is also special handling for papers by academy members.

  3. I think the authors still think they are right – I’ve had a bit of email contact with them over my criticisms, and that’s my impression. Unfortunately their responses do seem to be floundering a bit.

    There’s another intruiging aspect: one of the authors is someone who’s written a book on the negative binomial, so knows his stuff about that. But I think the problems aren’t technical ones, rather they’re straight-forward data analysis (plot your residuals, boys and girls). As a few people have pointed out, fitting the model is straightforward in practice.

    • I’m comparing your Fig. 1 & Fig. 2: Is the straight line in Fig.2 vs the Curve in Fig.1 because you chose them a priori? Or did you model the residuals in Fig. 2 with a quadratic & they turned out to be a straight line fit?

      • The lines on the plots are just GAMs fitted to the residuals, i.e. it’s a curve that’s adjusted to the data. The straight line is because the GAM is actually a straight line plus a curve, and the curve isn’t needed, so it gets shrunk away to nothing.

    • Like your point that the most sensible model requires domain expertise input.

      Intriguing that a _well known_ statistician was involved? http://scatter.wordpress.com/2014/06/16/the-hurricane-name-people-strike-back/#comment-16188

      Andrew: This is a nice way to put it “can be used well even by people who don’t know what they are doing” especially as all of us at some point don’t know what we are doing and sometimes even realise it. So agree the sad part is the dismissal of a “heads up” that can be most helpful in the long run.

      Here, according to Bob, it is really only the domain experts that are likely to know what they are doing re: getting a sensible model – but they seem to have been kept out by statistical technocratic process.

      This quite commonly is the case in many areas with LR testing and or AIC/BIC measures being given the authority to identify the bestest model of all.

      • I’m the one who posted the comment on Scatterplot. Here it is, for those who don’t want to click through:

        ***

        What I find most puzzling about this episode is that one of the authors of the original article is a well-known statistician who has written on the negative binomial model. I’m assuming he is a competent statistician.

        How, then, did these authors go so wrong in generating their results, and in ways that a decent 1st-year grad methods course should have taught them to avoid? Why are they continuing to go so wrong in defending their results? Where was, and where is, the statistician?

        Maybe the statistician’s role was confined to telling the other authors how to write a 1/2 a tweet’s worth of code, after which they were on their own. (That’s worth authorship?) Or, maybe his role was always purely symbolic: the authors needed to have a statistician listed on the grant proposal [paper] in order to increase their chances of funding [publication], and he agreed with the understanding that he wouldn’t actually be involved in modeling the data. Either way, it suggests there’s something more horribly awry with the organization of science practice than social psychologists — in this case — p-hacking their way into PNAS and media attention.

        ***

        Is the “well-known” label fair? I had heard his name before (I’m not a statistician), and in a previous thread, other comments suggested that he has a reasonably good reputation in the statistics community.

        But, whether he’s really “well known” or not doesn’t much matter. If statisticians are going to put their name on papers (and/or accept grant money as consultants, which may not have happened in this case), they have more than a little responsibility to make sure the models stand up to at least low-level scrutiny and the estimated coefficients — all of them, not just the ones that confirm the hypotheses — are interpreted correctly. “Buyer beware” doesn’t work well as a model for statistical consulting, at least if the goal is to advance science.

        • > If statisticians are going to put their name on papers

          No one really knows what having a statistician’s name on a paper means.

          It does suggest they _may_ have been involved in some way – but little else.

          It also suggest the journal editors might have thought a statistical review was not necessary :-(

          A quick look failed to find their name on any of the defensive responses – the upgrading to apparently more sophisticated standard errors suggests they are involved.

          One of the problems is when authors put your name on their papers without your permission or more commonly get your permission then change the statistical content without letting you know, it can be career tricky to back out or do anything about. Once I had to wait for the journal to reject the submission before I could have my name removed, another time for a funding agency to reject the grant application and once I had to leave my name on a paper given a perceived risk of losing my job if I asked to be removed.

        • Why don’t Journals insist on a signed statement (or a website form assent) by each author allowing his name to be on a paper?

  4. This kind of thing is where I don’t pay much attention to the statistics issues. I can’t get past the stupid level, which is the assertion that gender naming of a weather event leads to different numbers of people killed. Even if their model were rigorously correct, it proves nothing. It doesn’t even assert anything without creating a tenuous thread. As in, let’s say that people react differently to a hurricane named ‘Soothing’ versus one named ‘Killer’. How does one get from react differently to not taking sufficient precautions? How does that become a civil defense failure in which somehow there is no evacuation order because the storm is ‘Soothing’ despite winds of 190MPH and a huge predicted storm surge that arrives at high tide? It almost requires a world in which labels take the place of facts: it’s a girl’s name so we don’t need to look at barometric pressure.

    I hate when people use something as broad as “gender differences” without explaining how exactly this “difference” operates in this case. Think about “gender differences” in this case: not actual differences in how the genders do things but how we perceive physical events because of a label that happens to have gender meaning in other circumstances. I can imagine a case where gender labeling might have a difference. Let’s say we have 2 unknown male fighters, one named Sue and the other Bob. Maybe people bet more on Bob because Sue is a girl’s name. I might expect something like that but only because we have no information about whether Sue and/or Bob are good fighters. What if Sue has a record of 18-1 and Bob is 11-8 against the same competition? If there’s still more money on Bob, that would be an interesting and unexpected result because bettors would be acting irrationally and we could perhaps see a link to expectations rooted in the name label. We could translate this idea into hurricanes if we asked, “Do people think Hurricane Fawn will be more destructive than Hurricane Terminator?”, but only if we don’t describe the hurricanes with actual meteorological data. Once we have pictures of the storm and data about wind intensity, etc., then the concept of gender differences falls apart. As in, I might think a dog named ‘Pussycat’ is a sweetheart but if it lunges at me with teeth bared then I get the idea pretty darned fast. Meteorological data is like bared teeth: the storm tells you if the name “fits” or not.

  5. This himmicanes paper is pretty close to the Platonic ideal of shitty social science.

    I’m less charitable to both the authors and the reviewers than Andrew seems to be. When they ran a regression of fatalities on gender, they didn’t find an effect. When they controlled for normalized damage and minimum pressure, they still didn’t find an effect! That should have taken the wind out of their sails, but they pressed on until they found a low p-value, gave their paper the title “Female hurricanes are more deadly,” and speculated that renaming a hurricane could possibly triple the number of deaths it caused.

    That this paper was not only published but heavily publicized is a disgrace, and a sign of a fundamental lack of scientific ethics. I think the most positive spin you could apply to this is something like Sturgeon’s law, on the grounds that most work done by professional academics is complete garbage just like anything else.

    • Popeye:

      In a lot of these sorts of situations, I’m wondering when we can say that the defense of such work is unethical behavior. Just by analogy, is it unethical if Dan Brown writes poorly-written novels? Presumably not: he’s giving the people what they want. Similarly, these sort of researchers are validated by the success they get from peer-reviewed publication. They’re so far down the rabbit hole, that it’s not clear at what point one should say that they are being flat-out unethical. I just don’t know what to think about these cases.

      • Well, I wasn’t just referring to the researchers when I talked about unethical — this is a social failure. The scientific community supposedly has this duty that goes beyond telling people what they want to hear, and this social norm needs to be enforced for the community to function properly. Or at least that’s what I think — one could obviously develop a more “realist” position on what scientists actually do.

        • > telling people what they want to hear

          Right, that’s the politician’s prerogative ;-)

          It is a real challenge _causing to understand_ the conceptual insights from statistics (which get conflated with technique). Most academic researchers don’t get it and flogging them might not help much. Note Fernando’s comment that is likely to get much worse.

        • I agree that statistics is hard. But I don’t think this particular example can be explained away by insufficient statistical understanding.

          Here are two simple rules of thumb:
          1. Start with simple models first, and then make your models more complicated as necessary. You should understand what you are gaining as the models become more complicated.
          2. When you estimate an effect size, think about how plausible your result is, and what the general implications of such an effect size would be.

          If you can’t grasp these two principles, you honestly have no business doing statistical work. And if you do understand these two principles, then you really have no excuse for writing/accepting/promoting this particular paper (and others like it)… except that the “system” liked it and it provides you publicity, career advancement, etc.

        • Popeye,

          I may be misinterpreting your comment.

          But, I don’t think your “simple rule of thumb” (1) is universally accepted among experienced, practicing statisticians. And to me, it sounds like a recipe for “forking paths”.

          In fact, I think many modelers and statisticians prefer to begin (and end) with a single rich model that is unavoidably complex enough to reflect scientific consensus on the underlying process and flexible enough to admit competing scientific hypotheses.

          JD

        • > accepted among experienced, practicing statisticians

          In agreeing with you, I would add until we have surveys of experienced, practicing statisticians we really don’t know what percentage would do/suggest what in what kind of setting.

          I suspect many or at least at lot more that we hope would prop up and support the original kind of analysis (my biased convenience sample supports that).

          In the above, there is even the suspicion that an experienced, practicing statisticians
          is in their corner, advising them to hold their ground.

        • Well, I could be wrong. But let’s take this particular case. Say I’m interested in the question of how the gender of a hurricane’s name affects how deadly it is. What’s the right model to start with?

          Personally, I would start very simple and look at average fatality rates between male and female names. I know this doesn’t control for the fact that male and female storms might have had meaningfully different physical characteristics but that’s still the first thing I do. Next I would run a more complete regression with some controls for physical characteristics. If at this point I still haven’t seen any gender-based patterns, given my priors on this topic I would abandon ship and work on something else.

          Isn’t this what anyone else would do? Do practicing statisticians really start with a full-fledged model in Stan? I’m not sure what’s wrong with the start-simple approach: complicated models are more prone to programming errors, and if an effect only appears in a complex model and not in a simple model there should be a reason for it and it’s the researcher’s job to understand that reason.

          I suppose for a more developed research area where there are well-developed hypotheses already in play you don’t need to start so simple — for example, if you’re researching the drivers of changes in crime rates, I presume there’s an existing universe of tested models that have been tested, plus competing hypotheses are more fleshed out so researchers should be more immediately sensitive to them — but for a topic like the hurricane study, I don’t see why you wouldn’t start nearly from scratch. (And of course the researchers in play did exactly that — they even published the results of the simple regression models that showed negligible gender main effects! — they just completely downplayed this.)

  6. One way to view the problem is that people don’t know how to use statistics.

    Another is that people should not be using statistics (or only simple statistics).

    For example, perhaps another research design with some matching and a difference of means would have done the job just as well, while being more transparent and intelligible (I have not read the paper so don’t know).

    But no, we have to use “sophisticated models”. Granted, these are sometimes necessary. Yet for the most part I would argue these are simply fetishes, in part used to intimidate the public and show off.

    IMHO the main contribution of multivariate regression to the social sciences over the past 3 decades or so has been the creation of the largest published garbage patch of spurious correlations in the history of mankind. The published equivalent of the Great Pacific Garbage Patch.

  7. I have posted my results from looking at the data at:
    http://statmodeling.stat.columbia.edu/2014/06/06/hurricanes-vs-himmicanes/

    Several comments: 1) If one works with log(NDAM) as an explanatory variance, Katrina and Audrey no longer appear as outliers, at least in an analysis where Gender is treated as categorical.

    2) On the interpretation of regression coefficients, note the book:

    Tu and Gilthorpe: ‘Statistical Thinking in Epidemiology. CRC Press, 2012

    Note particularly its discussion of “Lord’s paradox”, the equivalent for continuous variables of the Yule-Simpson paradox. The book is not at all limited in its relevance to epidemiology. It is a pity that its title might seem to suggest that it has such a focused relevance.

    The relevance here is that Jung et al seem to be arguing that the variables `NDAM` (property damage in 2013 monetary values) and `MinPressure_before` together account for the potential of hurricanes to cause human deaths, and that `MasFem`, in interaction with `NDAM` and `MinPressure_before`, then account for the difference between potential deaths and observed deaths.

    3) A side comment is that R has an astonishing number of functions that purport to fit a model with negative binomial errors. All those that I have tried have problems with the model that has a quadratice in log(NDAM) as an explanatory variable.

  8. Pingback: A week of links | EVOLVING ECONOMICS

  9. Pingback: Himmicanes And Hurricanes Update | A Bunch Of Data

Leave a Reply to Popeye Cancel reply

Your email address will not be published. Required fields are marked *