Following up on my entry the other day on post-publication peer review, Dan Kahan writes:
You give me credit, I think, for merely participating in what I think is a systemic effect in the practice of empirical inquiry that conduces to quality control & hence the advance of knowledge by such means (likely the title conveys that!). I’d say:
(a) by far the greatest weakness in the “publication regime” in social sciences today is the systematic disregard for basic principles of valid causal inference, a deficiency either in comprehension or craft that is at the root of scholars’ resort to (and journals’ tolerance for) invalid samples, the employment of designs that don’t generate observations more consistent with a hypothesis than with myriad rival ones, and the resort to deficient statistical modes of analysis that treat detection of “statististically significant difference” rather than “practical corroboration of practical meaningful effect” as the goal of such analysis (especially for experiments). This problem is 1,000x as big as “fraud” or “nonreplication” (and is related at least to the latter, which is predictable consequence of the substitution of NHT rituals for genuine comprehension of causal inference);
(b) the real peer review is *always* the one that happens after articles are published–how could this be otherwise? 1.5 or 3 in rare instances 4 people read a paper beforehand & many times that many after; didn’t someone write a paper recently on need to apply to knowledge on validity & reliability to the procedures we use for producing/disseminating/teaching such knowledge?!; and
(c) the “conclusion” or “finding” of a paper is never any stronger than — never anything *other than* — the weight that one can assign to the evidence it adduces in favor of a hypothesis that can never be deemed to be “proven,” so a paper can never ultimately have any more influence in a scholarly conversation than the validity and cogency of the causal inference its design genuinely supports (those determine weight).
To me this makes the impact of *bad” papers self-limiting. They will likely get published, even in good journals– b/c of (a). But because any problem genuinely worthy of being solved will always compels the sustained attention of serious people– ones who *get* that publication of a paper is merely an announcement (one that might well be wrong) that someone has generated some relevant evidence worthy of consideration & not a show-stopper “conclusive proof” of anything — (b) & (c) will inevitably blunt the impact of the bad papers that get published. So I don’t worry that much about publication of bad papers.
Note: this analysis excludes the problems w/ the “WTF” genre of sudies, which aren’t about trying to solve real puzzles but instead regaling people who think psychology is just “Ripley’s Believe it or Not” w/ ANOVAs,,,. The harm of those papers isn’t that they’ll tempt us to accept *wrong* answers to important questions (the risk of the sort of bad papers I’m describing)–because they aren’t even addressing such problems; it is that (a) they will divert space in journals, and maybe creative effort by researchers who crave attention, away from papers that address real issues; and (b) diminish the credibility of social science in the minds of serious people.
I don’t know if I agree with Kahan’s claim that the impact of *bad” papers will be “self-limiting.” My reason for doubt is explained well by Jeremy Fox in his discussion of some findings that, when mistakes are published in the scientific literature, they tend to persist even after being corrected:
The data show that scientists rely on pre-publication peer review, to the exclusion of post-publication review. Once something has passed pre-publication peer review, the scientific community mostly either accepts it uncritically, ignores it entirely, or else miscites it as supporting whatever conclusion the citing author prefers. . . .
For better or worse, the only time most of us read like reviewers is when we’re acting as reviewers. Plus, pre-publication is the only time authors are obliged to pay attention to criticism. . . .
It’s not easy to criticize the work of others, because that often seems like criticizing the people who did the work, and nobody but a jerk enjoys criticizing other people. Pre-publication peer review is an institutionalized practice that gets around this very human desire to want to think well of one’s peers, and to have them think well of you. That’s why, as frustrated as I (and probably all of you) often get with pre-publication peer review, I’d like to see it reformed rather than replaced. . . .
Fox is talking about biology and ecology, but I suspect these problems are going on in other scientific fields as well, and Fox’s perspective seems similar to that of Nicolas Chopin, Kerrie Mengersen, and Christian Robert in our article, In praise of the referee.
But I brought this up right now not to discuss peer review but to emphasize that once a mistake is published, it’s hard to dislodge it.
Anyway, to continue with the main thread, here’s Kahan again:
I think the “WTF” findings are more likely to get “pounded in” than bad studies on things that actually matter. The things that matter are issues of consequence for knowledge or practice that usually admit of multiple competing explanations– the ones in the EOOOYKTA —“everything-is-obvious-once-you-know-the-answer” — set, which is where you will find *serious* social scientists laboring. There I think the life of an invalid study is likely to be short, even if it starts out w/ much fanfare. It is short, moreover, b/c it *lives* in the minds of serious people, who really want to know what’s going on. “WTF” is a kind of intellectual junk food, produced for people who generally don’t think critically. And, via the sort of science journalism you criticized in your Symposium article, gets pounded “deeply and perhaps irretrievably into the recursive pathways of knowledge transmission associated with the internet.”
Science journalism is another one of the professions — like the teaching & propagation of knowledge relating to statistics — that is dedicated to transmitting information on what science has discovered through use of its signature method of disciplined observation & inference but that doesn’t use that method to assess its own proficiency in transmitting such insight.
A useful supplemental remedy to the one you propose — calling up lots of experts to see what they think — is for journalists simply to *read* scientific studies in the way they are supposed to be *read*: not for reports of “facts” discovered or conclusive proven, but as reports of the production of valid *evidence* that a thoughtful person could assimilate to everything else he or she knows to update an assessment that is itself subject to revision upon production of yet further valid evidence — forever & ever. A journalist who just reads the “intro” & “conclusion” — or just the university press release– & says “Science proves x!” not only doesn’t *get* the study. He or she can’t possibly be *telling the story* that a person who is genuinely intrested in scientific discovery cares about. That *story* necessarily identifies the problem that motivated the researcher, describes the sort of observations a researcher collected to investigate it, explains the logic of the causal inference that connected those observations to a conclusion, and the various statistical or other steps a researcher to test and probe the strength of inference. A journalist ought to do that — just b/c he or she ought to; it’s the craft of the profession that person is in. But a journalist who does this routinely — who applies critical reasoning to a purported empirical proof — might well figure there’s a problem when a publisher press release anounces, that a researcher has “proven” that “people named Kim, Kelly, and Ken more likely to donate to Hurricane Katrina victims than to Hurricane Rita victims”!
I’m not so optimistic as Kahan here. For one thing, when I first encountered the “dentists named Dennis and lawyers named Laura” paper, I simply took it as true. Even now, after the paper has been subject to serious criticism, I still don’t know what to believe. I’m similarly on the fence regarding the Christakis/Fowler findings on the contagion of obesity. And, of course, Freakonomics (as well as, presumably, Kanazawa himself) got fooled by the beauty-and-sex-ratio study.
I think it’s fine for science reporters to read scientific papers, but I think it’s hard for any outsider to spot the flaws. If I can’t reliably do it and Steven Levitt can’t reliably do it, I think journalists will have trouble with this task too. As I wrote in my article, some of the problems of hyped science arise from the narrowness of subfields, but you can take advantage of this by moving to a neighboring subfield to get an enhanced perspective.
I’ll give Kahan the last word by linking to this recent post of his where he considers science communication in more detail.