Skip to content

What has happened down here is the winds have changed


Someone sent me this article by psychology professor Susan Fiske, scheduled to appear in the APS Observer, a magazine of the Association for Psychological Science. The article made me a little bit sad, and I was inclined to just keep my response short and sweet, but then it seemed worth the trouble to give some context.

I’ll first share the article with you, then give my take on what I see as the larger issues. The title and headings of this post allude to the fact that the replication crisis has redrawn the topography of science, especially in social psychology, and I can see that to people such as Fiske who’d adapted to the earlier lay of the land, these changes can feel catastrophic.

I will not be giving any sort of point-by-point refutation of Fiske’s piece, because it’s pretty much all about internal goings-on within the field of psychology (careers, tenure, smear tactics, people trying to protect their labs, public-speaking sponsors, career-stage vulnerability), and I don’t know anything about this, as I’m an outsider to psychology and I’ve seen very little of this sort of thing in statistics or political science. (Sure, dirty deeds get done in all academic departments but in the fields with which I’m familiar, methods critiques are pretty much out in the open and the leading figures in these fields don’t seem to have much problem with the idea that if you publish something, then others can feel free to criticize it.)

As I don’t know enough about the academic politics of psychology to comment on most of what Fiske writes about, so what I’ll mostly be talking about is how her attitudes, distasteful as I find them both in substance and in expression, can be understood in light of the recent history of psychology and its replication crisis.

Here’s Fiske:



In short, Fiske doesn’t like when people use social media to publish negative comments on published research. She’s implicitly following what I’ve sometimes called the research incumbency rule: that, once an article is published in some approved venue, it should be taken as truth. I’ve written elsewhere on my problems with this attitude—in short, (a) many published papers are clearly in error, which can often be seen just by internal examination of the claims and which becomes even clearer following unsuccessful replication, and (b) publication itself is such a crapshoot that it’s a statistical error to draw a bright line between published and unpublished work.

Clouds roll in from the north and it started to rain

To understand Fiske’s attitude, it helps to realize how fast things have changed.
As of five years ago—2011—the replication crisis was barely a cloud on the horizon.

Here’s what I see as the timeline of important events:

1960s-1970s: Paul Meehl argues that the standard paradigm of experimental psychology doesn’t work, that “a zealous and clever investigator can slowly wend his way through a tenuous nomological network, performing a long series of related experiments which appear to the uncritical reader as a fine example of ‘an integrated research program,’ without ever once refuting or corroborating so much as a single strand of the network.”

Psychologists all knew who Paul Meehl was, but they pretty much ignored his warnings. For example, Robert Rosenthal wrote an influential paper on the “file drawer problem” but if anything this distracts from the larger problems of the find-statistical-signficance-any-way-you-can-and-declare-victory paradigm.

1960s: Jacob Cohen studies statistical power, spreading the idea that design and data collection are central to good research in psychology, and culminating in his book, Statistical Power Analysis for the Behavioral Sciences, The research community incorporates Cohen’s methods and terminology into its practice but sidesteps the most important issue by drastically overestimating real-world effect sizes.

1971: Tversky and Kahneman write “Belief in the law of small numbers,” one of their first studies of persistent biases in human cognition. This early work focuses on resarchers’ misunderstanding of uncertainty and variation (particularly but not limited to p-values and statistical significance), but they and their colleagues soon move into more general lines of inquiry and don’t fully recognize the implication of their work for research practice.

1980s-1990s: Null hypothesis significance testing becomes increasingly controversial within the world of psychology. Unfortunately this was framed more as a methods question than a research question, and I think the idea was that research protocols are just fine, all that’s needed was a tweaking of the analysis. I didn’t see general airing of Meehl-like conjectures that much published research was useless.

2006: I first hear about the work of Satoshi Kanazawa, a sociologist who published a series of papers with provocative claims (“Engineers have more sons, nurses have more daughters,” etc.), each of which turns out to be based on some statistical error. I was of course already aware that statistical errors exist, but I hadn’t fully come to terms with the idea that this particular research program, and others like it, were dead on arrival because of too low a signal-to-noise ratio. It still seemed a problem with statistical analysis, to be resolved one error at a time.

2008: Edward Vul, Christine Harris, Piotr Winkielman, and Harold Pashler write a controversial article, “Voodoo correlations in social neuroscience,” arguing not just that some published papers have technical problems but also that these statistical problems are distorting the research field, and that many prominent published claims in the area are not to be trusted. This is moving into Meehl territory.

2008 also saw the start of the blog Neuroskeptic, which started with the usual soft targets (prayer studies, vaccine deniers), then started to criticize science hype (“I’d like to make it clear that I’m not out to criticize the paper itself or the authors . . . I think the data from this study are valuable and interesting – to a specialist. What concerns me is the way in which this study and others like it are reported, and indeed the fact that they are repored as news at all”), but soon moved to larger criticisms of the field. I don’t know that the Neuroskeptic blog per se was such a big deal but it’s symptomatic of a larger shift of science-opinion blogging away from traditional political topics toward internal criticism.

2011: Joseph Simmons, Leif Nelson, and Uri Simonsohn publish a paper, “False-positive psychology,” in Psychological Science introducing the useful term “researcher degrees of freedom.” Later they come up with the term p-hacking, and Eric Loken and I speak of the garden of forking paths to describe the processes by which researcher degrees of freedom are employed to attain statistical significance. The paper by Simmons et al. is also notable in its punning title, not just questioning the claims of the subfield of positive psychology but also mocking it. (Correction: Uri emailed to inform me that their paper actually had nothing to do with the subfield of positive psychology and that they intended no such pun.)

That same year, Simonsohn also publishes a paper shooting down the dentist-named-Dennis paper, not a major moment in the history of psychology but important to me because that was a paper whose conclusions I’d uncritically accepted when it had come out. I too had been unaware of the fundamental weakness of so much empirical research.

2011: Daryl Bem publishes his article, “Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect,” in a top journal in psychology. Not too many people thought Bem had discovered ESP but there was a general impression that his work was basically solid, and thus this was presented as a concern for pscyhology research. For example, the New York Times reported:

The editor of the journal, Charles Judd, a psychologist at the University of Colorado, said the paper went through the journal’s regular review process. “Four reviewers made comments on the manuscript,” he said, “and these are very trusted people.”

In retrospect, Bem’s paper had huge, obvious multiple comparisons problems—the editor and his four reviewers just didn’t know what to look for—but back in 2011 we weren’t so good at noticing this sort of thing.

At this point, certain earlier work was seen to fit into this larger pattern, that certain methodological flaws in standard statistical practice were not merely isolated mistakes or even patterns of mistakes, but that they could be doing serious damage to the scientific process. Some relevant documents here are John Ioannidis’s 2005 paper, “Why most published research findings are false,” and Nicholas Christakis’s and James Fowler’s paper from 2007 claiming that obesity is contagious. Ioannidis’s paper is now a classic, but when it came out I don’t think most of us thought through its larger implications; the paper by Christakis and Fowler is no longer being taken seriously but back in the day it was a big deal. My point is, these events from 2005 and 1007 fit into our storyline but were not fully recognized as such at the time. It was Bem, perhaps, who kicked us all into the realization that bad work could be the rule, not the exception.

So, as of early 2011, there’s a sense that something’s wrong, but it’s not so clear to people how wrong things are, and observers (myself included) remain unaware of the ubiquity, indeed the obviousness, of fatal multiple comparisons problems in so much published research. Or, I should say, the deadly combination of weak theory being supported almost entirely by statistically significant results which themselves are the product of uncontrolled researcher degrees of freedom.

2011: Various episodes of scientific misconduct hit the news. Diederik Stapel is kicked out of the pscyhology department at Tilburg University and Marc Hauser leaves the psychology department at Harvard. These and other episodes bring attention to the Retraction Watch blog. I see a connection between scientific fraud, sloppiness, and plain old incompetence: in all cases I see researchers who are true believers in their hypotheses, which in turn are vague enough to support any evidence thrown at them. Recall Clarke’s Law.

2012: Gregory Francis publishes “Too good to be true,” leading off a series of papers arguing that repeated statistically significant results (that is, standard practice in published psychology papers) can be a sign of selection bias. PubPeer starts up.

2013: Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek, Jonathan Flint, Emma Robinson, and Marcus Munafo publish the article, “Power failure: Why small sample size undermines the reliability of neuroscience,” which closes the loop from Cohen’s power analysis to Meehl’s more general despair, with the connection being selection and overestimates of effect sizes.

Around this time, people start sending me bad papers that make extreme claims based on weak data. The first might have been the one on ovulation and voting, but then we get ovulation and clothing, fat arms and political attitudes, and all the rest. The term “Psychological-Science-style research” enters the lexicon.

Also, the replication movement gains steam and a series of high-profile failed replications come out. First there’s the entirely unsurprising lack of replication of Bem’s ESP work—Bem himself wrote a paper claiming successful replication, but his meta-analysis included various studies that were not replications at all—and then came the unsuccessful replications of embodied cognition, ego depletion, and various other respected findings from social pscyhology.

2015: Many different concerns with research quality and the scientific publication process converge in the “power pose” research of Dana Carney, Amy Cuddy, and Andy Yap, which received adoring media coverage but which suffered from the now-familiar problems of massive uncontrolled researcher degrees of freedom (see this discussion by Uri Simonsohn), and which failed to reappear in a replication attempt by Eva Ranehill, Anna Dreber, Magnus Johannesson, Susanne Leiberg, Sunhae Sul, and Roberto Weber.

Meanwhile, the prestigous Proceedings of the National Academy of Sciences (PPNAS) gets into the game, publishing really bad, fatally flawed papers on media-friendly topics such as himmicanes, air rage, and “People search for meaning when they approach a new decade in chronological age.” These particular articles were all edited by “Susan T. Fiske, Princeton University.” Just when the news was finally getting out about researcher degrees of freedom, statistical significance, and the perils of low-power studies, PPNAS jumps in. Talk about bad timing.

2016: Brian Nosek and others organize a large collaborative replication project. Lots of prominent studies don’t replicate. The replication project gets lots of attention among scientists and in the news, moving psychology, and maybe scientific research, down a notch when it comes to public trust. There are some rearguard attempts to pooh-pooh the failed replication but they are not convincing.

Late 2016: We have now reached the “emperor has no clothes” phase. When seemingly solid findings in social psychology turn out not to replicate, we’re no longer surprised.

Rained real hard and it rained for a real long time

OK, that was a pretty detailed timeline. But here’s the point. Almost nothing was happening for a long time, and even after the first revelations and theoretical articles you could still ignore the crisis if you were focused on your research and other responsibilities. Remember, as late as 2011, even Daniel Kahneman was saying of priming studies that “disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true.”

Then, all of a sudden, the world turned upside down.

If you’d been deeply invested in the old system, it must be pretty upsetting to think about change. Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up. The analogy’s not perfect, though, because there’s no one for her to sell her shares to. What Fiske should really do is cut her losses, admit that she and her colleagues were making a lot of mistakes, and move on. She’s got tenure and she’s got the keys to PPNAS, so she could do it. Short term, though, I guess it’s a lot more comfortable for her to rant about replication terrorists and all that.

Six feet of water in the streets of Evangeline

Who is Susan Fiske and why does she think there are methodological terrorists running around? I can’t be sure about the latter point because she declines to say who these terrorists are or point to any specific acts of terror. Her article provides exactly zero evidence but instead gives some uncheckable half-anecdotes.

I first heard of Susan Fiske because her name was attached as editor to the aforementioned PPNAS articles on himmicanes, etc. So, at least in some cases, she’s a poor judge of social science research.

Or, to put it another way, she’s living in 2016 but she’s stuck in 2006-era thinking. Back 10 years ago, maybe I would’ve fallen for the himmicanes and air rage papers too. I’d like to think not, but who knows? Following Simonsohn and others, I’ve become much more skeptical about published research than I used to be. It’s taken a lot of us a lot of time to move to the position where Meehl was standing, fifty years ago.

Fiske’s own published work has some issues too. I make no statement about her research in general, as I haven’t read most of her papers. What I do know is what Nick Brown sent me:

For an assortment of reasons, I [Brown] found myself reading this article one day: This Old Stereotype: The Pervasiveness and Persistence of the Elderly Stereotype by Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske (Journal of Social Issues, 2005). . . .

This paper was just riddled through with errors. First off, its main claims were supported by t statistics of 5.03 and 11.14 . . . ummmmm, upon recalculation the values were actually 1.8 and 3.3. So one of the claim wasn’t even “statistically significant” (thus, under the rules, was unpublishable).

But that wasn’t the worst of it. It turns out that some of the numbers reported in that paper just couldn’t have been correct. It’s possible that the authors were doing some calculations wrong, for example by incorrectly rounding intermediate quantities. Rounding error doesn’t sound like such a big deal, but it can supply a useful set of “degrees of freedom” to allow researchers to get the results they want, out of data that aren’t readily cooperating.

There’s more at the link. The short story is that Cuddy, Norton, and Fiske made a bunch of data errors—which is too bad, but such things happen—and then when the errors were pointed out to them, they refused to reconsider anything. Their substantive theory is so open-ended that it can explain just about any result, any interaction in any direction.

And that’s why the authors’ claim that fixing the errors “does not change the conclusion of the paper” is both ridiculous and all too true. It’s ridiculous because one of the key claims is entirely based on a statistically significant p-value that is no longer there. But the claim is true because the real “conclusion of the paper” doesn’t depend on any of its details—all that matters is that there’s something, somewhere, that has p less than .05, because that’s enough to make publishable, promotable claims about “the pervasiveness and persistence of the elderly stereotype” or whatever else they want to publish that day.

When the authors protest that none of the errors really matter, it makes you realize that, in these projects, the data hardly matter at all.

Why do I go into all this detail? Is it simply mudslinging? Fiske attacks science reformers, so science reformers slam Fiske? No, that’s not the point. The issue is not Fiske’s data processing errors or her poor judgment as journal editor; rather, what’s relevant here is that she’s working within a dead paradigm. A paradigm that should’ve been dead back in the 1960s when Meehl was writing on all this, but which in the wake of Simonsohn, Button et al., Nosek et al., is certainly dead today. It’s the paradigm of the open-ended theory, of publication in top journals and promotion in the popular and business press, based on “p less than .05” results obtained using abundant researcher degrees of freedom. It’s the paradigm of the theory that in the words of sociologist Jeremy Freese, is “more vampirical than empirical—unable to be killed by mere data.” It’s the paradigm followed by Roy Baumeister and John Bargh, two prominent social psychologists who were on the wrong end of some replication failures and just can’t handle it.

I’m not saying that none of Fiske’s work would replicate or that most of it won’t replicate or even that a third of it won’t replicate. I have no idea; I’ve done no survey. I’m saying that the approach to research demonstrated by Fiske in her response to criticism of that work of hers is an style that, ten years ago, was standard in psychology but is not so much anymore. So again, her discomfort with the modern world is understandable.

Fiske’s collaborators and former students also seem to show similar research styles, favoring flexible hypotheses, proof-by-statistical-significance, and an unserious attitude toward criticism.

And let me emphasize here that, yes, statisticians can play a useful role in this discussion. If Fiske etc. really hate statistics and research methods, that’s fine; they could try to design transparent experiments that work every time. But, no, they’re the ones justifying their claims using p-values extracted from noisy data, they’re the ones rejecting submissions from PPNAS because they’re not exciting enough, they’re the ones who seem to believe just about anything (e.g., the claim that women were changing their vote preferences by 20 percentage points based on the time of the month) if it has a “p less than .05” attached to it. If that’s the game you want to play, then methods criticism is relevant, for sure.

The river rose all day, the river rose all night

Errors feed upon themselves. Researchers who make one error can follow up with more. Once you don’t really care about your numbers, anything can happen. Here’s a particularly horrible example from some researchers whose work was questioned:

Although 8 coding errors were discovered in Study 3 data and this particular study has been retracted from that article, as I show in this article, the arguments being put forth by the critics are untenable. . . . Regarding the apparent errors in Study 3, I find that removing the target word stems SUPP and CE do not influence findings in any way.

Hahaha, pretty funny. Results are so robust to 8 coding errors! Also amusing that they retracted Study 3 but they still can’t let it go. See also here.

I’m reminded of the notorious “gremlins” paper by Richard Tol which ended up having almost as many error corrections as data points—no kidding!—but none of these corrections was enough for him to change his conclusion. It’s almost as if he’d decided on that ahead of time. And, hey, it’s fine to do purely theoretical work, but then no need to distract us with data.

Some people got lost in the flood

Look. I’m not saying these are bad people. Sure, maybe they cut corners here or there, or make some mistakes, but those are all technicalities—at least, that’s how I’m guessing they’re thinking. For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction.

In her article that was my excuse to write this long post, Fiske expresses concerns for the careers of her friends, careers that may have been damaged by public airing of their research mistakes. Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.

Some people got away alright

The other thing that’s sad here is how Fiske seems to have felt the need to compromise her own principles here. She deplores “unfiltered trash talk,” “unmoderated attacks” and “adversarial viciousness” and insists on the importance of “editorial oversight and peer review.” According to Fiske, criticisms should be “most often in private with a chance to improve (peer review), or at least in moderated exchanges (curated comments and rebuttals).” And she writes of “scientific standards, ethical norms, and mutual respect.”

But Fiske expresses these views in an unvetted attack in an unmoderated forum with no peer review or opportunity for comments or rebuttals, meanwhile referring to her unnamed adversaries as “methological terrorists.” Sounds like unfiltered trash talk to me. But, then again, I haven’t seen Fiske on the basketball court so I really have no idea what she sounds like when she’s really trash talkin’.

I bring this up not in the spirit of gotcha, but rather to emphasize what a difficult position Fiske is in. She’s seeing her professional world collapsing—not at a personal level, I assume she’ll keep her title as the Eugene Higgins Professor of Psychology and Professor of Public Affairs at Princeton University for as long as she wants—but her work and the work of her friends and colleagues is being questioned in a way that no one could’ve imagined ten years ago. It’s scary, and it’s gotta be a lot easier for her to blame some unnamed “terrorists” than to confront the gaps in her own understanding of research methods.

To put it another way, Fiske and her friends and students followed a certain path which has given them fame, fortune, and acclaim. Question the path, and you question the legitimacy of all that came from it. And that can’t be pleasant.

The river have busted through clear down to Plaquemines

Fiske is annoyed with social media, and I can understand that. She’s sitting at the top of traditional media. She can publish an article in the APS Observer and get all this discussion without having to go through peer review; she has the power to approve articles for the prestigious Proceedings of the National Academy of Sciences; work by herself and har colleagues is featured in national newspapers, TV, radio, and even Ted talks, or so I’ve heard. Top-down media are Susan Fiske’s friend. Social media, though, she has no control over. That’s must be frustrating, and as a successful practioner of traditional media myself (yes, I too have published in scholarly journals), I too can get annoyed when newcomers circumvent the traditional channels of publication. People such as Fiske and myself spend our professional lives building up a small fortune of coin in the form of publications and citations, and it’s painful to see that devalued, or to think that there’s another sort of scrip in circulation that can buy things that our old-school money cannot.

But let’s forget about careers for a moment and instead talk science.

When it comes to pointing out errors in published work, social media have been necessary. There just has been no reasonable alternative. Yes, it’s sometimes possible to publish peer-reviewed letters in journals criticizing published work, but it can be a huge amount of effort. Journals and authors often apply massive resistance to bury criticisms.

There’s also this discussion which is kinda relevant:

What do I like about blogs compared to journal articles? First, blog space is unlimited, journal space is limited, especially in high-profile high-publicity journals such as Science, Nature, and PPNAS. Second, in a blog it’s ok to express uncertainty, in journals there’s the norm of certainty. On my blog, I was able to openly discuss various ideas of age adjustment, whereas in their journal article, Case and Deaton had nothing to say but that their numbers “are not age-adjusted within the 10-y 45-54 age group.” That’s all! I don’t blame Case and Deaton for being so terse; they were following the requirements of the journal, which is to provide minimal explanation and minimal exploration. . . . over and over again, we’re seeing journal article, or journal-article-followed-by-press-interviews, as discouraging data exploration and discouraging the expression of uncertainty. . . . The norms of peer reviewed journals such as PPNAS encourage presenting work with a facade of certainty.

Again, the goal here is to do good science. It’s hard to do good science when mistakes don’t get flagged and when you’re supposed to act as if you’ve always been right all along, that any data pattern you see is consistent with theory, etc. It’s a problem for the authors of the original work, who can waste years of effort chasing leads that have already been discredited, it’s a problem for researchers who follow up on erroneous work, and it’s a problem for other researchers who want to do careful work but find it difficult to compete in a busy publishing environment with the authors of flashy, sloppy exercises in noise mining that have made “Psychological Science” (the journal, not the scientific field) into a punch line.

It’s fine to make mistakes. I’ve published work myself that I’ve had to retract, so I’m hardly in a position to slam others for sloppy data analysis and lapses in logic. And when someone points out my mistakes, I thank them. I don’t label corrections as “ad hominem smear tactics”; rather, I take advantage of this sort of unsolicited free criticism to make my work better. (See here for an example of how I adjusted my research in response to a critique which was not fully informed and kinda rude but still offered value.) I recommend Susan Fiske do the same.

Six feet of water in the streets of Evangeline

To me, the saddest part of Fiske’s note is near the end, when she writes, “Psychological science has acheived much through collaboration but also through responding to constructive adversaries . . .” Fisk emphasizes “constructive,” which is fine. We may have different definitions of what is constructive, but I hope we can all agree that it is constructive to point out mistakes in published work and to perform replication studies.

The thing that saddens me is Fiske’s characterization of critics as “adversaries.” I’m not an adversary of pscyhological science! I’m not even an adversary of low-quality psychological science: we often learn from our mistakes and, indeed, in many cases it seems that we can’t really learn without first making errors of different sorts. What I am an adversary of, is people not admitting error and studiously looking away from mistakes that have been pointed out to them.

If Kanazawa did his Kanazawa thing, and the power pose people did their power-pose thing, and so forth and so on, I’d say, Fine, I can see how these things were worth a shot. But when statistical design analysis shows that this research is impossible, or when replication failures show that published conclusions were mistaken, then damn right I expect you to move forward, not keep doing the same thing over and over, and insisting you were right all along. Cos that ain’t science. Or, I should say, it’s a really really inefficient way to do science, for individual researchers to devote their careers to dead ends, just cos they refuse to admit error.

We learn from our mistakes, but only if we recognize that they are mistakes. Debugging is a collaborative process. If you approve some code and I find a bug in it, I’m not an adversary, I’m a collaborator. If you try to paint me as an “adversary” in order to avoid having to correct the bug, that’s your problem.

They’re tryin’ to wash us away, they’re tryin’ to wash us away

Let me conclude with a key disagreement I have with Fiske. She prefers moderated forums where criticism is done in private. I prefer open discussion. Personally I am not a fan of Twitter, where the space limitation seems to encourge snappy, often adversarial exchanges. I like blogs, and blog comments, because we have enough space to fully explain ourselves and to give full references to what we are discussing.

Hence I am posting this on our blog, where anyone has an opportunity to respond. That’s right, anyone. Susan Fiske can respond, and so can anyone else. Including lots of people who have an interest in psychological science but don’t have the opportunity to write non-peer-reviewed articles for the APS Observer, who aren’t tenured professors at major universities, etc. This is open discussion, it’s the opposite of terrorism. And I think it’s pretty ridiculous that I even have to say such a thing which is so obvious.

P.S. More here: Why is the scientific replication crisis centered on psychology?


  1. D Kane says:

    Great summary. But sad that our friends from the Lancet Iraq death estimates did not make the timeline. That is certainly what got me involved with data sharing/replication issues.

    • I think that a long-term consideration of this line of work in conflict epidemiology will reflect quite well on this group, and perhaps a future SMCISS blog post will take this on. (I have worked with them on two more recent household surveys.)

      • D Kane says:

        The people behind the Lancet surveys, especially Les Roberts and Gilbert Burnham still refuse to release the (anonymized data) or the computer code used to compile their results. (And kudos to Mike Spagat for continuing to work on this.) You really think that “long-term consideration” will put them in a good light? If you aren’t transparent, your results can’t be trusted and, given where social science is going, I doubt that history will judge you gently.

        • I have good news about this, which I think is exactly the kind of “changing winds” that Andrew’s blog addresses. In the first survey I worked on with this team, a 2013 update to the Iraq mortality estimates, I argued that we should release a “replication archive” containing all data and analysis code necessary for reproducing the results in the paper. Burnham and company agreed, although I think it was just to humor me:

          During the next survey I helped them with, I was pretty busy with other responsibilities and much less involved in the analysis. I did not have the time to advocate for a data release, let alone do all the work preparing an archive. So I was very happily surprised when the paper came out to see that they created a public archive of all the study data without any urging from me:

          • D Kane says:

            Kudos to you and your co-authors! This is just excellent. Any chance that you can get Burnham et al to release at least the code (if not the data) from the 2004/2006 papers? One reason why those results might be outside the confidence intervals of your more recent work might be coding errors.

  2. Keith O'Rourke says:

    > statisticians can play a useful role in this discussion.
    Before 2010 (maybe 2006) I believe most statisticians just dismissed there being much of a problem.

    One comment I remember getting at a meeting of established research statisticians was “you are painting an overly bleak picture of clinical research”.

    But it was really only in 2009 that I became aware of what extra information regulators get about studies that I finally realized the meta-analysis of published clinical papers on randomized trails was largely hopeless for the present.

    Hey that’s what 90% of my academic work was on :-(

    • Rahul says:

      Why is it hopeless? Can you elaborate? Sounds overly pessimistic to me.

      • Keith O'Rourke says:

        You mean like I am painting an overly bleak picture of clinical research ;-)

        What you will often learn from just the published papers can be quite different than what you would learn from all the documents the regulatory gets to see and even audit for authenticity.

        A public or prominent instance of this was – The Cochrane team, led by Dr Tom Jefferson, … decided to produce a systematic review of data held in the CSRs, and to ignore published study reports – the first (and still only) time this had been undertaken within Cochrane.”

        As a wrote to a former colleague at Cochrane “Being at a regulatory agency this makes perfect sense [to me] and Senn and you I am sure would agree, but one of my colleagues is wondering whether this will cause systematic reviews of just published data to be avoided (if at all possible) in the future”. They responded that they have to make do with what they can get.

        But until this is fixed (e.g. data held in CSRs is made available), I do think it is close to hopeless.

  3. Anonymous says:

    “Hence I am posting this on our blog, where anyone has an opportunity to respond. That’s right, anyone. Susan Fiske can respond, and so can anyone else. Including lots of people who have an interest in psychological science but don’t have the opportunity to write non-peer-reviewed articles for the APS Observer, who aren’t tenured professors at major universities, etc. This is open discussion, it’s the opposite of terrorism. And I think it’s pretty ridiculous that I even have to say such a thing which is so obvious.”

    As someone who’s (in my opinion) perfectly reasonable, topical, and informational anonymous comments were recently deleted from 2 recent APS observer pieces on pre-registration and replications, i just want to thank you for this post, and for allowing anyone to comment, even anonymously.

    Here is Neuroskeptic on why anonymity in science (for instance when commenting on APS observer pieces) might be a good thing:

    I hope someone will link this post in the comment section of Fiske’s APS piece when it is published.

    • Mark Pawelek says:

      1) Anonymity is good when you absolutely need it to keep your job, save your life, … yet still tell the truth. In most social media anonymity is terrible, because it encourages irresponsible comment. When people aren’t accountable, they too often lie, use slurs, misrepresent, …

      • Anonymous says:

        absolutely tangential to the main discussion: if you limit anonymity to only those that “need” it, the anonymity set (the group of people that one claiming anonymity is indistinguishable from) is limited to people that “need to hide” for whatever reason, and thus everyone in the set is “guilty”, so the use of anonymity techniques itself is a sign of likely “wrongdoing” in the eyes of whoever might have the power to unmask.
        you are right about the lack of social control making anonymous communication different from what we are used to, and often frustrating. but unless it is normal / normatively accepted to claim anonymity “just because” now and then, the value it can offer in a crisis is threatened.

        • Mark Pawelek says:

          Anonymity encourages bad behaviour I’ve witnessed on internet forums. A compromise bans anonymity for people unable to use it responsibly. I’m only talking about anonymity within a community. I’ve no intention of helping any state ban anonymity. Yours is a purely abstract argument for anonymity. Mine is more practical: be prepared to TOS people who misuse it. Running an internet blog/forum/community where you routinely allow idiots to post whatever they want anonymously is a sure way to get it taken down by the PTB. You can be as idealistic as you want, but without a forum or blog you’ve no one to talk to.

      • Roger Sweeny says:

        Dear Professor Goldin-Meadow,

        I am a junior professor at _______. I wish to post an anonymous comment criticizing your speech because I fear it would jeopardize my chances for tenure (several of the full professors here have worked with you or been your students). Would you please certify that I need the anonymity so the comment can be posted?

        Thank you.

  4. Shecky R says:

    Thanks for this… and the problems in social psychology (in particular) research long preceded Susan Fiske and will long follow her. The paradigm for ‘constructive’ criticism of quality long needed changing and is now evolving.

  5. Baruch says:

    I was favorable to her paper — i am sensitive to “standing on giants’ faces” as Dienes quotes Broadbent, but your post convinced me. She is wrong.

  6. Yes, Fiske weakens her argument by refusing to say what she means. Who are these “methodological terrorists”? What “smear tactics” are being employed? What does she classify as an “ad hominem” remark? If you mention the author of a particularly bad study or silly theory, are you speaking “ad hominem”?

    Also, wouldn’t any tenure committee worth its salt look into the merit of any criticisms? If a stream of nasty tweets, on its own, makes them deny someone tenure, what kind of tenure committee is this? I would hope for a little more intellectual ruggedness.

    I was struck by this part of your comment: “Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.”

    There’s an awful lot of money in this, too. According to the Washington Speakers Bureau website, Amy Cuddy’s speaker fees are in tier 6–that is, $40,001 and up. I’m sure she has to pay some of it to the bureau, etc., but even so, she could pull in six figures with just a few talks. The least a speaker of this fee range (or any fee range) can do is admit to error.

  7. Ruben says:

    Some of us spoke to her at the DGPS just now. We emphasised that we want this to be constructive and about the science, that we feel lumped in with non-constructive people, that we know heaps of people whose careers flounder because of the reproducibility crises.
    Here’s some write-up:
    She didn’t really want to talk about this, but she kind of doubled down on the metaphor (both have virtuous goals and don’t mind collateral damage apparently), but said that lots of people told her they are scared their research will be targeted.
    It didn’t feel like this conversation accomplished much.

    • Andrew says:


      The conversation may have accomplished something. My guess is that Fiske thinks that she has the support of a vast silent majority and that the people she disagrees with are a small group of malcontents and outside agitators. I have no idea what the majority of psychology researchers think about all this—I’ve seen no surveys on the topic—but maybe if Fiske sees pushback in person, she’ll have more of a sense that these are real people who disagree with her, that it’s not just a bunch of bomb-throwers or whatever it is that she was thinking. I’m sure it would take a lot for her to change her research practices or even her general attitudes about research, but a few encounters like this might at least change her perception of attitudes within the field of psychology, and she might then feel a little less confident about her ability to speak on behalf of the profession in this way.

      • Rahul says:

        In my opinion she’s probably right in thinking that she has the support of a vast silent majority.

        • Huffy says:

          No, across the field of academic psychology generally, I am sure she does NOT have much support at all, much less a vast silent majority. I say that from talking to many many people in different areas. However, in the most rotten areas like Social Cognition, though, she probably does have the support of a lot of nervous people with plum academic jobs, because in those fields a ommon path to getting a desirable job has been to p-hack the hell out of your data (or worse) so you appear to have made fascinating discoveries. No doubt about it that those people do not want all these rocks turned over through replications of their own work.

  8. Jeff Helzner says:

    Yet Fiske doesn’t seem to have any issue with fluffy TED talks. Apparently TED provides the quality control she mentions.

    “For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction.”

    mic drop

  9. Simon Gates says:

    – Amy Cuddy’s speaker fees are in tier 6–that is, $40,001 and up.

    Yikes. Well, that would create a bit of an incentive…

  10. Frank Charles says:

    This was a great read. Thank you.

    – Student of Research methods in Psychology.

  11. numeric says:

    I’ve seen very little of this sort of thing in statistics or political science (Sure, dirty deeds get done in all academic departments but in the fields with which I’m familiar, methods critiques are pretty much out in the open and the leading figures in these fields don’t seem to have much problem with the idea that if you publish something, then others can feel free to criticize it.)

    Fisher and the fiducial theory? Also, from an NSF grant proposal (not mine),

    “Finally, a word of advise to the PI. To disparage King’s work, as you frequently do in the proposal, is stupid. You will only hurt yourself. King may very well review this proposal and he is quite likely to be asked to write a letter evaluating your work when you are considered for promotion to tenure. King’s work is hardly the last word on ecological inference. In fact, it is probably the beginning of a whole new stream of research by himself and others that generalize the field. Many smart people have worked on the problem of ecological inference and made little or no progress. King’s book is an important breakthrough. To fail to recognize this is only to provide evidence of your own immaturity.”

    [if I can figure out a way to e-mail this to you anonymously, I’ll send it to you]. The point is that it happens all the time in academic political science, but usually the transactions take place over the phone–this grant proposal was unusual in that someone put how suppression occurs in writing. But if you want a published source over what it takes to succeed in academic political science, see (synopsis–shut up and suck up). I would say that the ability of senior people to eliminate any threats to their theories is the main reason quantitative political science has not progressed in any essential manner since the American Voter. In particular, the rational choice paradigm applied to American national elections has created a nearly completely incorrect view of these elections, which have been centered around race rather than policy choices (this latest election makes it clear that most voters are not choosing based on policy positions).

    • RE: “email anonymously” aren’t there temporary file-sharing sites you can use? Things like

    • Stephen says:

      This reminds me of the time that King intimidated the editor of the American Political Science Review into retracting a paper that was critical of his ecological inference work. In fact, it wouldn’t surprise me if your quote is part of the same case.

    • Andrew says:


      I’m certainly not saying that the fields of statistics and political science are immune to backdoor politicking. Maybe a better way to say it is that in statistics and political science, I don’t see any real connection between backdoor politicking and attitudes on research controversies.

      In her article, Fiske was not just saying that there are some sleazoids who send anonymous unsolicited letters to tenure review committees; she was directly connecting this behavior to the replication-and-criticism movement. I have no idea if there is any such connection in psychology; my point in mentioning stat and poli sci was to say that I know of no such systematic behavior in those fields, behavior that goes beyond individual assholes and cliques to rise to the level of a deliberate attempt to move the direction of the field via subterfuge.

      • numeric says:

        >level of a deliberate attempt to move the direction of the field via subterfuge.

        I would say Fiske isn’t using subterfuge–she’s just incompetent (but a full professor at Princeton!). When incompetence is pointed out, she reacts like an academic–she attempts to silence the source or use ad hominem attacks. But here’s the nice thing–she has to do it publicly, rather than pick up the phone (which is the standard method in academic political science). That’s because she can’t pick up the phone to silence you.

        >Look. I’m not saying these are bad people. Sure, maybe they cut corners here or there, or make some mistakes, but those are all technicalities—at least, that’s how I’m guessing they’re thinking. For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction.

        Solzhenitsyn says that when you have spent your life establishing a lie, what is required is not equivocation but rather a dramatic self-sacrifice (in relation to Ehrenburg– I see no chance of that happening in any social science field–tenure means Fiske will be manning (can I use that phrase?) the barricades until they cart her off in her 80’s. Thinking machines will be along in 20-30 years and then universities can dismantle the social sciences and replace them with those.

        • Andrew says:


          Just to be clear, I wasn’t saying that Fiske herself was using subterfuge. I was saying that in her article, Fiske seems to be implying that the “methodological terrorists” were using subterfuge by sending unsolicited letters to committees, etc. Or maybe subterfuge isn’t even the point, as she was also expressing anger that the terrorists were acting openly by publishing criticisms on social media.

          • numeric says:

            These letters to committees happen (essentially) in academic political science, also. I was in a department where there were two types of tenure review–an “honest” review and a not so honest review. The way this was handled was on the not so honest reviews, the tenure committee members called the selected letter writers and asked what type of letter they were going to write. If the reply was negative, that individual letter writer was removed and another one was selected. On a “honest” review, the letters were just sent out without pre-check. Of course, I heard about this from those faculty who had been submitted to the honest review, as they were none too happy about being singled out for this honor.

            I guess my overall point is that it is so easy to manipulate these procedures and the people running them may not be “bad” people, but they don’t have a lot of integrity, either intellectually or morally, and it shows in the research. Who will guard these selfsame guardians? Your blog is a step in the right direction, though I think you’re too easy on political scientists because you’re a wannabe–stick to statistics.

  12. Eric Loken says:

    Wow. This will be an interesting discussion. I will say that I applaud her call for civility and fairness. That much is certainly appreciated.

    But with regard to the premises and underlying understanding of the issues, I’m also reminded of that line from the Big Short (I believe it was there)that went something like this – “Whatever that guy is buying, I want to bet against it.”

  13. Andrew, I think we need to talk about widening the picture beyond psychology, and beyond social sciences. Sure, psychology has its problems, but are these problems limited to psychology? Ioannidis’s paper and his recent open letter to a dead colleague suggest that medicine is similar. Indeed, your blog post about acupuncture indicates that medicine as a field doesn’t even know how to get anywhere on this kind of topic. As a society we spend Billions on developing and marketing cancer drugs many of which plausibly violate the basic concept of the hippocratic oath (ie. do no harm, many blockbuster cancer drugs plausibly offer a reliably reduced QUALY count by making people live marginally longer but in very sick conditions). Ioannidis discussed how his first application for a grant was denied de-facto (just ignored), it was to study in a clinical trial the role of antibiotics in chronic sinusitis. Chronic sinusitis affects something like 10% of the US in a very negative way. It turns out that current evidence suggests that it is in fact CAUSED in large part by antibiotics (specifically the depletion of beneficial bacteria). How many examples are there of medicine continuing to do things that are harmful because of poor practices and bad incentives and lack of knowledge of real scientific method, and reliance on cargo-cult statistical analyses with forking paths, and the secret datasets that Keith often talks about here?

    Outside medicine we’ve had recent examples in Economics, the inference on policy implications for austerity relied on by-hand data manipulation errors in Excel ( Tol’s papers on global warming that you discuss in your post. People read this stuff and make legal and economic policy based on it because it meets their preconcieved notions, and it appears with a big fat Academic Blue Ribbon.

    As academia and especially government funding in academia becomes more and more a “Market For Lemons” those of us who just can’t abide the thought of participating will opt out… and a feedback loop leaves us with either head-down clueless researchers with brief careers who don’t get tenure, or self-promoting puff-piece participants? How far down this path have we gone?

    In Engineering, my impression as a grad student was that for the most part the things being published were records of what people did to study the solution to problems that didn’t really exist using solution methods that were fancy sounding and difficult to explain. My dissertation and the main paper associated to it showed that the standard experimental methods to study soil liquefaction in the laboratory (the undrained cyclic triaxial compression test) were in fact ENTIRELY studying the properties of the rubber membranes used to hold the sample in the apparatus. This has been an active heavily funded area of research for around 50 years. The wikipedia definition of liquefaction as of today includes “A typical reference strain for the approximate occurrence of zero effective stress is 5% double amplitude shear strain”

    In the ground, 5% is ridiculous. It corresponds to a 10 meter thick sand fill in the SF bay settling 50 cm *prior to the onset of liquefaction*. In fact liquefaction occurs with strains of say 3 or 4 cm in such deposits. You need such large strains in a tabletop triaxial test because what determines the liquefaction in the tabletop is the elastic properties of the rubber membrane your sample is encased in, and essentially NOTHING else. Whole careers have been spent publishing papers on tabletop studies of liquefaction and trying to reconcile these studies with the mechanics of real-ground conditions.

    Is this a psychology and social science problem, or does it just permeate the fabric of academia?

    • Danny says:

      RE: The austerity study in economics. In defense of the field, that study was pretty loudly questioned from almost the day it was published. Published in 2010 to lots of skepticism, and by 2011 there were groups asking for their data and trying to replicate. And then failing, so R+R turned over their actual data in excel form and the error was discovered, and this discrepancy was published in 2013.

      All in all, that’s how science should work – skepticism of claims that seem dubious, replications almost immediately, and cooperative sharing of data. It’s also worth noting that the result moved the target variable of GDP growth from -0.1 to 2.2, but only 0.3 of that movement was due to the excel typo. The other movement was due to ‘research degrees of freedom’ – excluding certain countries for unclear reasons and a particular choice of weighting the data.

      Overall I think economics is reasonably good about this (although could still use much improvement) – most PHDs will have to take a large number of dense econometrics and statistics courses. Maybe that helps.

      • I am not on the inside of economics so it’s hard for me to know. Your point seems to be that econ is more skeptical and self-regulated than other fields in social science. That may be true, but nevertheless you admit “researcher degrees of freedom” was the big player in the R+R example. How much do economic conclusions rely on researcher degrees of freedom in the broad field? How methodologically rigid are economics people? My impression is that unbiased least-squares estimators and NHST and static linear regression type analyses are still pretty dominant (see Andrew’s example of the discontinuous polynomial regression on chinese coal pollution near a river for example, I’m limited in the examples I can bring to bear since as I say, i’m not in economics)

        “most PHDs will have to take a large number of dense econometrics and statistics courses. Maybe that helps”

        but statistics as practiced in standard textbook ways needs to take a big bunch of the blame for how things work today. So, perhaps not. Certainly I think an emphasis on unbiased least-squares estimation and randomization based inference and tests of normality of residuals and soforth is the dominant paradigm, treating countries or policies or people as independent samples from random number generators, so perhaps the stats makes things worse in that it embeds everything in an apparently rigorous mathematical formalism. I’d be interested to hear what a broad range of economists think.

        • Danny says:

          I’m not entirely inside either, to be upfront. But I think we may be talking about two different problems here – RDOF and the identification problem.

          With regards to RDOF, I think it exists in economics but likely to a lesser degree than other sciences. Certain R+R is an example of it, but it was also called out very quickly by other economists instead of festering for decades like ego depletion. As an aside, standard textbook statistics isn’t really the problem. It’s not that there’s anything wrong with doing significance tests and using BLUE estimators and linear regression. It’s when unaware researchers who are brilliant in their fields but relatively novice statisticians stumble into these tools, usually with a surface understanding but without a deep understanding of how statistics work. That’s how you get researcher degrees of freedom. If you’re rigorous in your approach, those tools can be utilized properly.

          The identification issues in macro are deeper, and that’s what I tend to worry about more. We make a lot of assumptions in economics, and normally this is fine. When you model a ball rolling across the floor in physics 101, you assume a perfect sphere and zero friction even thought neither of those is true – but it helps you learn about momentum and motion and so forth. Same with econ. When we make assumptions, we can build some fairly good models of how the economy works – sometimes on the macro and sometimes on the micro level. But as the models get more and more complex (and especially as they start to integrate microfoundations into macro models), a few things happen. What happens in your model can become more and more enmeshed with the starting axioms you choose. And you end up with a lot of variables that are really the algebriac leftovers of other variables, which can then be considered as concepts and put into other models. And those variables are really sensitive to calibration in the chaotic sense – small changes in one assumption can lead to moderate changes in a variable which can lead to huge changes in a model’s results.

        • Clyde Schechter says:

          Well, I am not an economist either. But Paul Romer, a former academic who now is chief economist at the World Bank thinks that macroecnomics is a science in failure mode, and thinks that this parallels the evolution of science in general. You can read his arguments at:

          The gist of it is that economists have cooked up fancy models involving variables that have no measurable counterpart in the real world, and then use these models to draw conclusions that reflect nothing more than the arbitrary assumptions made to identify the model. Not being familiar with the models he criticizes, I can’t assess his claims, but they sound quite plausible. He has been sounding this alarm for quite a while now, and has published numerous papers which you can easily find by Googling the term “mathiness” (which he coined.)

            • Andrew: two rather different Romer posts/articles referenced here. The “mathiness” one you link to was aimed at a specific problem that he argues is common across most of economics. The latest one (link in Clyde’s post) is a broadside aimed at a specific subfield, macroeconomics (and business cycle macro in particular).

              Great post (yours) by the way! No, the economics discipline is not immune. I think I may put this post on the reading list for a quants course I’m teaching.

          • As an economist, I’ll say that IMHO what Danny and Clyde are saying is correct, and that this points to problems with the scientific status of core economics that are quite different (one might say, the opposite) of those in social psychology or bio-medicine. The problems are different because (1) economic research tends to be more constrained by theory – often, models are identified by appealing to a theory which is conventional, even when there is no empirical evidence for it and the mechanism is implausible – and this reduces forking-paths opportunities, or at least imposes a high tariff in terms of the evidence and rhetoric required to follow a fork, and (2) much of it is done with publicly available datasets.

            Even the Tol case is best understood in this way: sure he did some bad statistics and refused to back down, and that’s what people on this site see; but the bigger picture is that he had for a time quite a good career in environmental economics by stubbornly insisting that certain conventional assumptions from economic models – essentially, short-term financial models – applied to climate change questions. With climate change there is a potentially catastrophic downside for every human being over thousands of years, and his approach gives such possible outcomes a trivial weighting, if anything at all. He’s pretty comprehensively lost that argument, but the fact that he was able to hold on as long as he did tells you where the problem is in economics: sticking with the conventional model as long as you possibly can, gets respect. The infamous (prestigious but non-peer reviewed) Journal of Economic Perspectives paper was possible because of the status he had built in the way I have just described.

            There is a lot of economics that’s not in this core – stuff on the margins of psychology, sociology, political science – and the problems encountered there can be somewhat different.

            • Peter Dorman says:

              This discussion is already history, being a day old, but I’ll put in this comment for the record. What’s interesting about an academic profession is not so much the random spray of error or blindness (which is where I would place R+R), but the patterns that structure these things in a systematic way. I see three of these (at least) in the methodology of economics.

              One is researcher degrees of freedom, which is exacerbated, not diminished, by the demand to incorporate assumptions of rationally calculating agents (with the default of no interaction effects that interfere with aggregation). For subfields of economics in which these assumptions are mandatory, like all of macroeconomics and many applied micro fields, empirical results that contradict them are difficult to publish. The idea is to produce research that “successfully” demonstrates that the presumed relationships hold. The biggest areas of RDF are in sample selection and choice of control variables. The quantity of cherry-picking that goes on in them is staggering. That was the basis of my empirical critique of Viscusi on hedonic wage analysis, for instance.

              A second is the fundamental confusion about NHST and what it does or doesn’t say about the hypothesis of interest. A high percentage of empirical studies in economics employs the strategy of rejecting the null and simply concluding that this result is “consistent with” the hypothesis generated by some theoretical model. This is how implausible theories survive over the decades and will continue to survive until the end of time unless more aggressive testing methodologies are adopted.

              The third is the high proportion of empirical work that is about calibration rather than testing, or to put it another way, work that treats the ability to calibrate a model as equivalent to testing it. This was what Romer was really going after, showing in detail how vacuous the procedure is when you look at how small a role evidence plays relative to assumption. But this is not just a problem in macro, it pervades micro as well. Look at almost any applied field and you will find a bunch of calibrated models that perform poorly out of sample because the calibration was all about fitting and not about truly understanding the process by which the data were generated.

              Not all of economics displays these patterns, of course. In every field there are researchers who really understand what it means to assess the evidence, and behavioral economics (even if it draws on that dubious enterprise, psychology), with its anomaly-hunting, is a breath of fresh air. (Don’t take what I say in parentheses too seriously.)

              Incentives are part of the story in economics, and also selection bias (who becomes an economist), and acculturation too. Even so, I think the profession is ripe for a big discussion of these issues. McCloskey showed the demand was there when she created a splash with her argument about the role of rhetoric, but it only gently touched on the methodological problems. She has gone a bit further with her crusade against the obsession with p-values, but still has not tied it to larger conceptual matters, as this blog has done. I think if someone with a bit of name recognition and a secure spot at a research university takes up this cause in economics it will get a lot of play. If anyone reading this (if anyone is reading this) is such a person, please consider taking this on.

    • Dan Simon says:

      I agree completely that the big problem in hard science is less methodological corruption than research direction corruption: researchers inventing irrelevant fake problems and building huge bodies of (often methodologically unobjectionable) research around them based on completely bogus claims of real-world relevancy or deep theoretical significance (or even both). I’ve certainly seen tons of it in my own field.

      But both forms of corruption stem from the same underlying problem: reliance on peer review, carefully insulated from all external influence, as the sole criterion for judging research. Because it’s a sacred principle that only peers may judge peers’ work, research comes to resemble a giant game of “Survivor”, in which everyone’s real goal–whatever the ostensible goals of their field may be–is to advance one’s career by winning the approval of peers by any means necessary. The result is that that the production of valid and valuable scientific research takes a distant back seat to rampant log-rolling, politicking and painstaking conformity to various arbitrary collectively-formulated community norms.

      The only solution is to introduce some kind of external accountability into the process. Unless there exist measurable criteria by which outsiders can judge the value of researchers’ work–and reward or dismiss it accordingly–the research itself will inevitably devolve into self-referential uselessness.

      • Keith O'Rourke says:

        > The only solution is to introduce some kind of external accountability into the process.
        Completely agree.

        I previously suggested random outside audits of academic work on this blog and some were outraged at my lack of trust of others.

        I think the problems are variably in all disciplines except perhaps accidentally and then just temporarily.

        But without those random (representative) outside audits no one will no.

        (In mathematics and related subjects, what to audit and how will be very tricky – proofs and derivations have already mostly been reproduced by outsiders.)

        Can anyone think of a university that might want to volunteer to go first?

      • DM Berger says:

        Dan Simon +1

        Within psychology, this manifests most clearly in the overwhelming prevalence of arbitrary metrics (i.e. measures or experimental outcomes that do not correlate with any “real-world” behaviours or objective outcomes of interest), excessive focus on “basic science” investigating the “theoretical mechanisms underlying behaviour” prior to the basic thorough description of phenomena, all while abandoning prediction or even considering application and practical relevance. And of course there’s the excessive use of college students.

        Even in the clinical domain, where research has been going on for decades, you still have major outcome measures and diagnostic categories (e.g. “depression” as measured by HAM-D and BDI) that are essentially arbitrary. Like we can be reasonably certain antidepressants and psychotherapy reduce HAM-D and BDI scores by a few points, but we have basically zero understanding of what that reduction means in objective terms.

        Closed loops within closed loops.

    • Bob says:

      Daniel wrote: “In Engineering, my impression as a grad student was that for the most part the things being published were records of what people did to study the solution to problems that didn’t really exist using solution methods that were fancy sounding and difficult to explain.”

      My field is communications engineering. My perception is that there are often papers that describe “the solution to problems that don’t really exist.” But, for the most part, the papers are useful. Papers are almost always correct. I recall discussing this issue with a friend who had been president of one of the larger professional societies in the field—his view was that there were many poor papers but very few wrong ones. I think that something similar must be true in math and physics.

      Here’s an example of the first two lines of an abstract from a leading journal in the field:
      In this paper, we consider the problem of estimating the state of a dynamical system from distributed noisy measurements. Each agent constructs a local estimate based on its own measurements and on the estimates from its neighbors. Estimation is performed via a two stage strategy, the first being a Kalman-like measurement update which does not require communication, and the second being an estimate fusion using a consensus matrix.

      That sounds more like an example in the Stan documentation than a paper from APS.

      For perspective, note that a modern iPhone can download files at more than 400 Mbps—we must be doing something right.


      • Engineering of course is the application of science to resource constrained economic problems (ie. how to best achieve some business goal, such as building a bridge or supplying water to a town, or reducing battery consumption of an iPhone). In many cases the empirical science is “god given” to the engineers, and the real question is what does the previously verified theory imply practically for the purpose at hand. Many engineers just don’t have to study the world to discover much of anything (that is, everything they rely on was previously verified by others, often decades or centuries ago). In such an environment the main issues are

        1) are you following the god given model correctly (methodological rigor)
        2) did your application pay off in a way that people care about

        My impression is that academic engineers are all about (1) but (2) is swept under the rug in many cases. The result is an academic field full of extremely sophisticated methodologically impeccable solutions to problems that largely don’t exist. It’s not everything, but it’s FAR from rare. If given a choice between applying for a grant to study some very real but complex problem, such as how best to reduce the risk of mosquito born illness in developing countries while doing minimal damage to the local ecology, vs methodologically rigorous application of a favorite method to a non-problem such as machine learning techniques for the use of cell phone sensor data to detect the location of urban heat islands… the latter wins far far too often. (note, even if you really care about urban heat islands, you can probably find them using very simple techniques and some satellite imagery that already exists)

        Of course, in some areas there is no comprehensive god given physical model. For example my graduate studies in soil mechanics. It became clear almost immediately that very basic early assumptions in the field had never been well thought out in the first place. In areas like this, where no comprehensive god given physical model is handed down, Engineers have the same empirical bumbling around that everyone else has. Some basic examples I’m aware of

        1) “Attenuation models” for earthquake shaking. If an earthquake occurs on a fault at point X predict the statistics of the shaking at point Y (peak ground acceleration, duration, peak velocity, total energy, etc)

        2) Mechanism of soil liquefaction: “During shaking, stress is transferred from the grains to the water, since water can not drain during the time-scale of the earthquake, the water pressure increases and this causes loss of strength” (sounds plausible but is actually circular reasoning and makes a fundamental empirically wrong mistake, water pressure *can* diffuse due to flow during even a single cycle of shaking, in fact this is the dominant issue)

        3) Models of the strength of materials. Far too many of these are just regression curves fit through some laboratory data. The extrapolation to real-world conditions may or may not be justified, and the models have no mechanism built in to help constrain the behavior under other conditions. Furthermore, sophisticated 3D computer models are built on this stuff giving the impression of a “solved problem”. Examples include the shear strength of concrete, and the behavior of soil in foundations.

        So, when Engineers are confronted with a problem where they need to develop a theory that has both mechanism and matches empirical reality, they seem to have the same problems everyone else does on average as a field. Empirical development of scientific models is hard. With Engineers, it doesn’t help that you can publish a lot more papers (and thus play the academic game better) solving many non-problems using god-given physical models than you can by developing a scientific research agenda to solve problems where no established model exists.

    • BenK says:

      I would like to remark that the ‘real’ reproducibility crisis was (for many of us) about the attempts by pharma to use drug lead compounds discovered in academic assays with methodological supporting data – and finding that not only did the results not generalize most of the time, but they didn’t even reproduce a fair amount of the time. This was where the rubber met the road – not in epidemiology or social cognition, but in pharma, where for better or worse, the FDA eventually stands athwart claims of progress yelling ‘Show me your data!’ Big Pharma needed to get by the FDA and can’t somehow skate by, because eventually post-market data will catch them, if the initial studies somehow pass.

      It isn’t just social science. Far from it.

      There are claims that areas close to engineering should get a pass – maybe even particle physics. Certainly mathematics.
      I’m inclined to agree. As for areas with no experimental component at all – pre-digital humanities, say – I guess they have completely different problems.

      • Mathematics should get a pass, but mathematics is not an empirical field (the philosophy of intuitionism notwithstanding). That is, whether Fermat’s Last Theorem is true or not does not rely on observations of experiments.

        I doubt particle physics is as clean and pure as you assume. I’ve been reading “Speakable and Unspeakable in Quantum Mechanics” by John Stewart Bell (of Bell’s inequalities fame). He’s pretty harsh on the fundamental philosophies embodied in QM. In particular “wave collapse” and a doubling down on some of the really odd ideas such as many worlds or Copenhagen interpretation. No one denies that QM makes accurate predictions, but Copenhagen says that this is all that is possible, there *does not exist an actual state of a particle* in between measurements. Of course, this is philosophical weirdness in which physicists tie themselves in knots in order to avoid giving up on the idea of locality. Bell’s inequalities prove *there are no local hidden variable theories* and he was actually on the side of QM is non-local, but the world has taken the approach of axiomatizing the idea that *there are no hidden variables* (ie. particles have no inherent state that determines the outcomes of experiments)

        His book discusses how this creates a gulf between the quantum world and the classical world that isn’t bridged by the theory. It’s a bit obscure but my impression is that the way QM is taught these days is a mathematical axiomatic approach that simply takes the Copenhagen interpretation as given. You can’t make much progress on this area if you’ve been taught that the axioms are unassailable and the theorems therefore are proven facts about the universe.

    • Simon Gates says:

      Daniel – yes I believe you are right. Medicine is thoroughly infected with the same problems. There have been lots of efforts to clean up methodology of things like clinical trials and systematic reviews over the years, which have certainly improved things, though lots of issues remain. But I have real concerns over lab studies and small-scale clinical studies up to small clinical trials – the sorts of things that are usually done by individual or small groups of clinicians, and fill up lots of space in medical journals. I don’t really know this field well but I come into contact with it sometimes in planning of clinical trials, and I see many of the same things as in psychology. This worries me because a lot of the time the justification for clinical trials (taking years to run and costing millions) is based on these sorts of study – so if we’re getting these wrong, we’re probably wasting a lot of time and effort.

    • Wonks Anonymous says:

      The excel errors actually had a very minor effect on that paper. Most of the differences with the Herndon, Ash & Pollin results were due to deliberate choices about which data to include and how to analyze it, but the excel stuff is easier for people to understand so that’s what gets referenced. The bigger problem though with the paper is that it didn’t even attempt to establish causality rather than correlation. After all, it should seem obvious that low growth can cause debt rather than merely the other way around. Miles Kimball has written about this.

  14. zbicyclist says:

    A friendly amendment to your timeline:

    1967: Psychology Today magazine is started, to popularize psychological research. It succeeded perhaps too well.

    (from Wikipedia) By “1976 Psychology Today sold 1,026,872 copies….From June 2010 to June 2011, it was the second top consumer magazine by newsstand sales.[5] In recent years, while many magazines have suffered in readership declines, Adweek, in 2013, noted Psychology Today’s 36 percent increase in number of readers.[6]”

    It’s not just that Psychology Today could take obscure social scientists and make them semi-public figures. The success of the magazine meant that people liked to hear about this stuff, and so other news outlets started paying more attention. This was not lost on university PR departments. But whereas the real news (like that state budget proposal) might be scrutinized by experts or at least political opponents, the “gee whiz” findings of social science tended to be reported with uncritical enthusiasm.

  15. David says:

    Genuine question from a new lecturer.

    “All the incentives fall in the other direction.”

    Let’s say I publish a paper in a high impact journal showing an interesting result with a suitably small p value. But then a replication study with a large sample finds no effect; the initial effect was probably just an outlier. No scientific fraud going on – just unfortunate that the initial result was illusory.

    What are the incentives and disincentives for a tenure-track researcher to admit that their published finding is probably wrong? Would they get through tenure review? Would they be able to find another academic job?

    I would guess that the result of admitting your findings were wrong would be no tenure and possibly no future job? If so then we’re actively selecting for researchers who won’t change their minds.

    • Keith O'Rourke says:

      My guess is that until fairly recently, the tenure stuff would have been over and done before any failed (and published) replications came forward.

      Also, one needs needs to keep in mind that adequate replication attempts of actual true findings should fail to be significant x% of the time (e.g. the unknown but true underlying power of the replication attempt). So unless some sloppiness, poor methods or misreporting is uncovered – there _should_ be little to worry about.

      Much more likely that researchers who were sloppy and flashy were be inadvertently selected on – and that’s part of the current challenge.

      (I do remember having to console a very careful researcher when they were feeling inadequate because their studies of clinical treatments – that were considered promising by their specialty – so far all ended up with no promising difference apparent. I told them it was because his studies were properly designed and carefully done that they showed no effects and at least patients and their families would appreciate this if they understood it. They might actually have been thinking of leaving research – fortunately they didn’t.)

  16. Paul Gowder says:

    Part of the problem here is that the ridiculous hyperbole (methodological terrorists? Wtf?) just ruins the potentially reasonable parts of her argument. But. But. Surely there are potentially reasonable parts? In particular, if people are choosing to critique psychological research not by publishing failed replications (even informally), publishing methodological critiques, etc., but by doing things like contacting tenure committees, trying to get people disinvited as speakers, etc., surely that *is* improper? That kind of behavior strikes me as just as much a violation of scientific norms as things like p-hacking: rather than openly criticize research in a public discussion, just taking direct aim at people’s careers in private.

    That being said, as you point out, Andrew, these are really nonspecific accusations. And it may be that there’s no actual evidence of anyone using these tactics. Still, if anyone is doing so, they should stop it.

    • Huffy says:

      The hypothetical junk you made up there, I have never heard of it happening.

    • Karl Kopiske says:

      I don’t think it is lost – people are very aware of her call for civility, just also very happy to point out the irony that her column lacks the same civility she demands. Whether the general tone will change, we will see.

      Also, I do not think that it is just her tone and the hyperbole. It is also the way she lumps together very obviously improper behaviour and very much not improper behaviour. One gets the impression that she genuinely thinks discussion without peer-review (i.e., blogging) is as bad as harrassing family and colleagues of another scientist. Either that, or she uses the latter to make a case against the former.

  17. Rachael Meager says:

    Amazing post. Thanks for keeping this history alive and pushing for better science.

  18. Llewelyn Richards-Ward says:

    Andrew, this piece is simply brilliant and gives a fantastic timeline of what myself and many of my clinical colleagues have been concerned about for many years. Thankyou. Science is about openness, not ego driven patch protection. Isn’t it odd really that people who might know better have such a defensive attitude to critique and questions? I worry for you in this context though. I just hope you don’t join the likes of Gallileo, who once proposed that the world is round. Let me know when the heresy trial is on and I’ll pop over!!!

  19. Galen says:

    Interesting to note that Susan Fiske was also the editor for PNAS of the ill-fated Facebook emotional contagion experiment from 2014. She had only a boilerplate response to my methodological critique (, which I sent to her after it was published. Given the national spotlight the piece received, I assume she got a large volume of responses that were not constructive, and I have a lot of sympathy for that. I do wish she had chosen to engage more with legitimate critics of the piece regarding its ethical and methodological problems, however.

    • Noah Motion says:

      I had dismissed that paper as little more than a clear illustration of the difference between statistical and substantive significance (and the influence of sample size on p values), but your paper is very interesting. I’m glad you dug into it in more detail. Thanks for the comment and the link.

      • Galen says:

        Yeah, they brush off the substantive significance by noting they have a bazillion users (I believe that was the exact figure), so, basically, an effect of almost any size could be considered substantively important.

        “After all, an effect size of d = 0.001 at Facebook’s scale is not negligible: In early 2013, this would have corresponded to hundreds of thousands of emotion expressions in status updates per day.”

        Thanks for reading my paper!

  20. Corey says:

    I will not be giving any sort of point-by-point refutation of Fiske

    By a curious coincidence, the internet slang for the act of giving a point-by-point refutation is fisking. Different Fisk, though.

  21. Raphael says:

    Thank you for this. As a graduate student, one form of intimidation that I don’t hear talked about enough is the message things like this article send to people entering the field of psychology. My experience is that those of us who think psychology needs statistical reform are treated pretty openly as outsiders, with vague but fairly constant references to both the supposed meanness of methodological criticism and to the distinction between real psychologists and “methods people”. I have seen (and been told directly by way of warning) how psychology departments view such “methods people” in the hiring process. I can’t help but view Susan Fiske’s article through this lense. By saying something everyone should agree with (don’t bully people) she is helping silence voices that would otherwise criticise a system she is clearly benefitting from. And this is even if we ignore the terrorism part!

    • Llewelyn Richards-Ward says:

      Raphael —

      Being a methods person wins no friends. I once (in a senior govt. role) critiqued a nationwide reliance on some unhelpful practices within the government ministry I worked within. The practice was (very simplistically) using multiple measures of criminal risk on release and allowing clinicians to use them somewhat additively — overestimation of risk hence delaying release of real people. When I presented a Bayesian explanation of why this was not acceptable, the then senior advisor of research (I was a more senior operations manager), who did not even understand Bayesian thinking, had the eloquent reply to a very long critique, “Llew [me] is wrong”. This left my other manager-level colleagues a little gobsmacked. It is many years since I had that role. Since then the same department has had significant legal pressures placed on it to change the same practice, which partially it has done. Another colleague working on sex-offending risk prediction tools did use a Bayesian approach to try and tidy up the fundamental errors, with some success, but again with a lot of pushback from political-level managers. “Yes”, you may experience pressures and intimidation. But doing the right thing is even more important when it ultimately impacts on people’s lives, as any worthwhile research must do. I am sure you will find many like-minded and ethical people; they just may not be the doyens you look up to, necessarily. Change is never easy, but I still believe that good science and ethical practice win in the end — sorry if this sounds a bit like a homily — just encouraging you to persevere.

    • Neurostormer says:

      Raphael, a word of hope: I didn’t realize until reading this post how recent the “bad stats revolution” has been, especially on the timescales of academic careers. I imagine that part of the hostility you are seeing is because the “anti-method” people are feeling the squeeze, and are lashing out to protect their turf. But unless psychology departs the realm of science, the wave of progress can only roll one way and the path forward for “methods people” will only get easier in the future compared to how it is today.

      • Adversary says:

        I am also a graduate student and should graduate within a year. I work in a biomedical field. Much of the motivation for my dissertation work came from (1) the realization that there exist many “urban legends” in my field that are commonly discussed/invoked, but for which little empirical support exists (I have followed the citation trails, and they don’t end anywhere near what they are used to claim), (2) the realization that much of the empirical evidence in my field comes from underpowered studies with poor statistical practices and open-ended hypotheses, and (3) the feeling that very few people in my field want to solve problems, and would rather go in circles with open-ended experiments that produce vague results that give rise to open-ended hypotheses that inform more open-ended experiments.

        I discuss these problems in my dissertation papers, because they are truly the motivations for my project. So far, most of the feedback I have received from reviewers has consisted of (1) suggestions that I not focus so heavily on limitations and flaws in other studies, and (2) criticism that my results are not interesting or necessarily novel because they are focused on “old” questions that have already been studied. Other reviewers have written reviews that contain enough falsehoods that I am sure they did not read the paper. One review was simply an expression of disappointment at how not novel my study was.

        Colleagues tell me that I am too harsh, and that I should “think about the hard work that went into the studies I am criticizing, and how it would feel to have someone criticize your work that way”. I could not care less about how much hard work went into something if it is an obstacle to solving the problems that I care about. I could not care less about someone’s feelings being hurt because they were criticized for doing poor work. I spend time on PubPeer writing lengthy (but impersonal) reviews detailing why the desired conclusions can’t be drawn from the evidence presented in high-profile studies in my field.

        I’m coming to accept that I am an adversary, as Susan Fiske claims, because I am not in this to be successful or to gain status or to make money or to feel important. I am in this to solve problems, and I am committed to being an adversary of the culture that stands in the way of achieving that goal.

        Methods people are the martyrs of the scientific enterprise.

        • Anoneuoid says:

          >”I am in this to solve problems, and I am committed to being an adversary of the culture”

          Problem one is that they are still acting as obstacles. You are wasting all your time being “an adversary” rather than solving biomedical problems. You should just be able to ignore them.

          Problem two is that once you figure out what needs to be done to solve problems, you will discover the need to learn the skills to build the tools needed to do so. This will not happen instantaneously, so you will be seriously set back in your career, possibly even kicked out of the field, since most do not understand what type of stuff needs to be done to actually generate something useful.

          Problem three is that even if you find a way around number 2, by the time it is over your colleagues will have no idea wtf you are talking about. They are thinking in terms of significant p-values and excel spreadsheets, etc and will have no concept at all about what you are trying to explain to them.

          The culture is so inconducive to proper research that we should consider any biomedical grad student who can avoid generating an entire report of misinformation to have been a great success.

        • Keith O'Rourke says:

          I worked with this guy when he was doing his research fellowship and I did not understand why he put so much effort placing earlier _famous_ work – that was now clearly deficient – in a good light.

          Part of it was giving a benefit of doubt that at that earlier time and context – it might well have been good work.

          If you can put the criticism in terms of what has been recently come to be understood as problematic though understandably not appreciated at the time – you might well enable yourself to do a lot more in your career.

        • Rahul says:

          A part of this is asking what exactly is the problem you are trying to solve. Who’s your audience?

          Assuming it is other biomed researchers, are the majority just innocently clueless about the problems in these studies? Or are they intentionally choosing to ignore them? To not rock the boat etc. Or maybe they just don’t care?

          Sometimes I wonder if “getting Fiske or Cuddy” to admit they were wrong is the right goal? Are we fighting the wrong battle? i.e. All smart professionals already smell out the Bullshit and avoid it. Some of them smell the bullshit but prefer to feign ignorance because it serves their interests better that way.

          But is there really a cohort of professionals who are being innocently duped by this stuff?

          If you really care about this stuff, publishing more academic papers hardly helps. Perhaps what you should be doing is touring high schools and community halls etc. and preaching to the real victims, the one who might actually innocently believe in power poses and get duped.

    • Anonymous says:

      Hi Raphael,

      I see where you’re coming from. I am a graduate student as well, and I spoke up about researcher degrees of freedom issues and p-hacking in a lab in my undergrad. I was promptly scolded and fired.

      There was probably a better way for me to have brought up these issues, but I was a 19 year old experiencing from a department chair a lot of what Susan Fiske claims people were doing to her.

      It can be discouraging to hear from these people warning you about how “methods people” are seen. However, there are many schools that do not hold this attitude. Despite my experience at my undergraduate institution, I worked with many other professors that would never have used these questionable methods. As well, my graduate institution has a strong focus on methodological rigor, and my supervisors at both schools have been “methods people”.

      Although it is possible a lot of questionable work also comes from lower-tier schools, a lot of the people that have a hard time accepting what’s going on are in high-profile positions at high-ranking universities. I don’t think that’s a coincidence. Now, I’m not telling you to lower your expectations, but it’s possible that the people at the top want to hold on a lot harder to the past, whereas schools just below have to be constantly improving to be recognized. Or you can join me in Canada, not that we’re devoid of all of these issues.

  22. I want to make a comment on the timeline. The Bem paper was circulating and being widely discussed in late 2010, before Simonsohn et al’s false-positive psychology article (published in October 2011). I have always assumed that Bem was a major impetus for the FPP paper, though I’ve never actually asked any of the FPP authors.

    A major turning point in my own awareness was a blog post in January 2011 by Tal Yarkoni, which well predates Simonsohn et al. It made the argument that you could see indirect signs of little fudges all over Bem that were (are?) probably common and accepted practice. It was the first time I encountered an idea that is now in wide circulation: that Bem was probably not a product of fraud or one big error somewhere (which is what everyone was thinking at the time), but rather a demonstration of how easy it is for acceptable-seeming little fudges to have an outsized effect in combination with each other, making it fairly easy produce an impossible result with a credible-looking analysis. False-Positive Psychology famously demonstrated the same thing, and the Garden of Forking Paths paper extended the idea. But I always remember Tal’s blog as my wake-up moment, showing me (and probably many others) the point first.

  23. H. Tailor says:

    This is an excellent ad feminem attack from the safety of male invulnerability. As a man, you may make grave statements and name names,
    and nothing will happen to you. But if a woman penned the very same words, knives would come out of every corner. And her career would be over.

    Surely you have read all the research concluding that women cannot afford to offend anyone. So you impugn Susan Fiske’s reputation, knowing full well that she cannot afford to use the same language and cynicism that you can. Shame on you.

    • Andrew says:


      I would have no problem if a woman were to pen the same words that I did. Or, I suppose, words with similar content. Indeed, it was Anna Dreber who told me about those failed replications of the power-pose studies, and I’ve just written an article with Hilde Geurts about the implications of the replication crisis for clinical neuropsychology. Susan Fiske is free to respond to what I wrote, either in the comments right here or in some other venue, perhaps PNAS. Really no problem at all, we should all be open. It happens that Fiske and her collaborators (one male, one female) made serious errors in their published data analysis, but I’ve made serious errors in my published data analyses too. Just as I feel free to point out where Fiske’s numbers don’t add up, she can feel free to read my research papers and point out any problems she finds with them. If you think that would end her career, then either you’re completely confused, or you know something that I don’t know about the employment contracts of tenured processors at Princeton.

      • Andrew, I think it’s not just about whether you can fire a tenured professor or not, it’s about whether a woman with the same level of criticism will be ostracized such that things will happen along the lines of:

        1) Losing prestigious editorships
        2) Being rejected for funding/grants
        3) Having bad responses from colleagues at conferences, inability to get collaborators
        4) Grad students / postdocs decide not to work for you
        5) Much more difficulty getting papers through anonymous review


        I’m not saying H Tailor is necessarily right, but more that there are legitimate potentials to end someone’s career even without them losing their job and that probably a double-standard of behavior does exist in academia.

        • Andrew says:


          I’m sure there are all sorts of double standards. What I was responding to was the specific remark by the commenter that “if a woman penned the very same words, knives would come out of every corner. And her career would be over.” The whole comment was ridiculous, but in particular it ignores all the women such as Bobbie Spellman who have been active in the psychology replication movement. Or if you want to consider a specific case of a woman who writes the same sorts of thing that I do, on a blog that’s kind of like mine, but still has a career, consider Dorothy Bishop:

          So, yes, I agree completely with you on the general point, just not on the specific claim made by the above commenter, who implies that Fiske is somehow constrained in her possible responses here. Given what we’ve already seen from Fiske about terrorism etc., I don’t think she feels very constrained at all!

          • “So, yes, I agree completely with you on the general point, just not on the specific claim made by the above commenter,”

            That seems reasonable, particularly given your links to examples of women already doing this kind of thing. It is my experience though that a double-standard of women vs men in academia is in fact a big problem generally.

            • Martha (Smith) says:

              Daniel said, “It is my experience though that a double-standard of women vs men in academia is in fact a big problem generally.”

              Yes, there is a problem here. But, like every problem, we need to consider the details: How large of a problem is it? In which contexts? Have there been changes over time in the extent of the problem?

              My take (as a woman academic a few years older than Fiske) is that H.’s comments are exaggerations — especially currently. They would have been closer to the truth a few decades ago, but today, Roger’s comment (below), “we all know full well that she will not lose her endowed chair for that language and probably will not have any problems continuing her editorship, either,”applies.

              Another relevant factor that may not be widely know is that some tension arose in the eighties between (on one side) women in mathematics and the biological and physical sciences, and (on the other side) some (not all) women in psychology and sociology. Specifically, some (not all) women in the social sciences promoted the ideas of “women’s ways of knowing” and “feminist science,” based on an “intuitive” way of knowing rather than relying on evidence and logic. Many women in math and science reacted to this with an, “And ain’t I a woman?” attitude. My impression (just based on what I have read in this blog) is that Fiske may have some attachment to the idea of “women’s science” that influences her view of science and prompts her to confuse valid criticism with sexist bullying.

        • Huffy says:

          What is your evidence for this double standard?

      • Belatedly come to this discussion. I realise that, as a woman I am an outlier in being outspoken, and also it is true that I am senior enough (and close enough to retirement) not to have to worry about consequences. But I do think sexism is a red herring here. We know women are generally less likely than men to engage in debate on social media and I don’t think this topic is any more gender-biased than any other.
        To me it is really quite simple: the most important thing is the science, not people’s careers. If we allow bad scientific practices to flourish, this has knock-on effects for those that come behind us. In the area I work in, developmental disorders, it can also affect the wellbeing of families affected by those disorders. I thank autistic researcher Michelle Dawson for continually reminding us of that on Twitter – we shouldn’t need reminding, but we do.
        I try to operate with the following rules: wherever possible, avoid getting personal; criticise the science, not the scientist. Basically we all have lots to learn and you should treat others as you’d like to be treated. But this precept takes 2nd place to precept no. 1, which is that getting the science right takes first place. If it is going to upset someone to draw attention to a serious flaw in their work, then try to do it as kindly as possible, but don’t hold back.
        But sometimes it is reasonable to get angry: that’s when you see people treating the whole thing as a game, where all that matters is publishing papers and getting grants, and you know that they are wilfully misleading others. Then I think we should call them out. But only if there is water-tight evidence that they are acting deceptively, rather than just lacking awareness or skills.
        I thank Andrew for his initiative in tackling the serious problems with reproducibility that are only now being recognised.

    • Anon– says:

      “So you impugn Susan Fiske’s reputation, knowing full well that she cannot afford to use the same language and cynicism that you can. Shame on you.”

      That strikes me as a pretty odd assertion / attempted shaming. Recall that Fiske seemed fine offending many people with her choices of language! “Methodological terrorism” etc. seem like language that at least matches Gelman’s.

      And it is pretty clear that Fiske has earned this reputation through her decisions not just as an author, but as an insider (and tenured professor!) in this field, such as being a member of the National Academy of Sciences.

    • It should be “ad feminam,” not “ad feminem.” This is a grave statement, and in making it I am probably digging my grave in terms of a career involving Latin. However, in that sense, I am not alone, as Latin has many graves, not least of which is “grave,” the neuter of “gravis.”

      More to the point: If intellectual critique had to go mum around vulnerable populations and personages, it wouldn’t be critique.

    • Roger Sweeny says:

      Andrew and Daniel, This is pretty obviously sarcasm. “She cannot afford to use the same language and cynicism that you can.” But her language was considerably more intemperate than Andrew’s. And we all know full well that she will not lose her endowed chair for that language and probably will not have any problems continuing her editorship, either.

      • Roger Sweeny says:

        I think he is also satirizing a certain style of bullying: “Historically, men have mistreated women. Therefore, in a dispute between a man and a woman, you must have a strong prior that the woman is being mistreated.”

    • Maz says:

      Surely you have read all the research concluding that women cannot afford to offend anyone.

      I’d like to see this research. What kind of study design? Sample size? Was it preregistered? Does it replicate?

      Given Fiske’s status in her field, I will respectfully argue that what you claim about her vulnerability is utter bullshit. It is precisely because Fiske is at the top of the traditional status hierarchy, with power to change the field for the better, that her defense of careerism against open science is so repugnant.

    • Bill Murdock says:

      Is no one getting the joke? (Or have I missed it in the thread that someone has?)

      “Surely you have read *all the research concluding* that women cannot afford to offend anyone.”

      What were the p-values and t-stats on those studies? ;)

      Let’s give Tailor a round of applause for slipping that one by you guys so quickly.

  24. Neuroskeptic says:

    Good post. Regarding my blog:

    “2008 also saw the start of the blog Neuroskeptic, which started with the usual soft targets (prayer studies, vaccine deniers), then started to criticize science hype (“I’d like to make it clear that I’m not out to criticize the paper itself or the authors . . . I think the data from this study are valuable and interesting – to a specialist. What concerns me is the way in which this study and others like it are reported, and indeed the fact that they are repored as news at all,” but soon moved to larger criticisms of the field. I don’t know that the Neuroskeptic blog per se was such a big deal but it’s symptomatic of a larger shift of science-opinion blogging away from traditional political topics toward internal criticism.”

    This is all true although I’d like to note that my very first post in 2008 was *both* aimed at a soft target (a downright crazy psychic lady) and also an internal criticism of science. As I wrote:

    “This sorry spectacle is more than just a source of cheap laughs for bored bloggers. Honestly, it is. It’s actually a fascinating case study in the psychology and sociology of science. McTaggart’s efforts to extract a positive result are far from unique – they are only marginally more strenuous than those of some respectable researchers.

    [the crazy psychic experiment with “positive” results] shows that if you look hard enough you can literally find any conclusion in any data set. All it takes is enough post hoc statistics and a willingness to overlook those parts of the data which don’t turn out the way you’d want. The problem is that in academic science, and especially in neuroscience and psychology, there is a strong pressure to do just that.”

  25. concerned scientist says:

    It is for reasons like these pointed out here that tenure, as we currently understand it, should be done away with, at the very least, heavily revised to a more sensible approach where researchers can/should be held accountable for their repeated mistakes. The hand-wavy nature of tenured faculty to poor research should be inexcusable.

  26. Jochen Weber says:

    Thanks so much for this very detailed and thorough piece, Andrew!

    Almost all of the reasons for my own periodically over-boiling dissatisfaction with the output of academic science (particularly in psychology, but also other fields) are represented. Over the summer I read David Harvey’s take-down on capitalism, and the more I look around, the more often I see that financial (and related) incentives are close to the heart of why human endeavors in all sorts of areas, including science, fail:

    We have goals that are inherently incompatible with one another! For researchers those goals encompass the search for improved theories and better prediction of (or explanation of patterns in) data but also the desire for a successful career with many top-tier publications to boot…

    Unfortunately, true advances typically require much more care and deliberation, and with increased competition–a circumstance which, to this day, most people unquestioningly consider to be a guarantee for “the best will win”–people are being pushed further and further towards engaging in practices that actually bring the worst out of academia. Similarly as in health care, where providers are incentivized towards proposing (medically) useless procedures for the mere benefit of additional payment, researchers see themselves in a situation in which they can either quit (and keep their conscience relatively clean) or “cheat”.

    To what extent those who are “beyond reproach” and insist that their work is without flaws are engaging in this game consciously (i.e. they are actually aware of their role in pushing scientific outcomes in their respective fields to ever less reliable and replicable states) is hard to say.

    In my opinion, the incentive structure must change. No matter what methodological obstacles (multiple comparison correction, pre-registration, open-access reviews, etc.) are considered, in the end people who are willing to work less stringently or with less diligence will always be at a systematic advantage, so long as publishing bad work is rewarded rather than punished. I don’t have a good sense of how this could be achieved (other than by almost asking society at large to give up on money and the associated prestige and power as tools of differentiation).

    The one aspect in which I agree with Fiske is that social media and the speed with which (snarky, clever) messages self-replicate via “sharing” or re-tweeting seem to penalize careful consideration just as much as poorly regulated financial incentives. To the extent that attention-grabbing headlines are beneficial to readership and attention, a thorough statistical analysis of a paper will get less coverage than a “terrorist attack” with personal attacks. One of the reasons why Donald Trump can, without exception, rely on the media to cover each and every of his insults: it makes for better spectacle…

  27. I would just like to say that I love this post and I want everyone to read it.

  28. Thomas Bowman says:

    For an particularly pertinent example of ‘researcher degrees of freedom’, see paul romer’s latest working paper on failures in macroeconomics.

    Excellent blog post. I feel the problem might be even worse in economics. When there’s no data generating experiments, there’s no emphasis on replication.

    • Danny says:

      Romer’s paper is not about reproducability or researcher degrees of freedom, but about identification and calibration issues. Essentially, he believes certain concepts in macroeconomics are poorly defined – often as merely the algebriac leftover of other (better understood) variables, without a concrete real-world grounding. This can lead to perfectly replicable but nonsensical results, in his view.

      • Danny says:

        As an addition, macro often has the *opposite* problem from what researcher degrees of freedom would produce. Kocherlakota has made this point – given how sparse macro data is, macroeconomic researchers should have many different models that all fit the data pretty well. That’s what you’d expect from sparse data. Instead, macro has one dominant paradigm (DSGE) that doesn’t actually fit the data very well. This should lead us to believe that the field has unjustifiably strong priors about the assumptions baked into the models, as well as pointing towards the identification/calibration issues listed above.

  29. Carl Shulman says:

    I might put Feynman’s 1974 speech on cargo cult science in the timeline:

  30. Larry says:

    I have to agree that I love the post and the comments. I also have to add, extraneously, that I love the use of Randy Newman’s lyrics to segment the post. One of my favorite albums and song.

  31. ex-social psychologist says:

    Former professor of social psychology here, now happily retired after an early buyout offer. If not so painful, it would almost be funny at how history repeats itself: This is not the first time there has been a “crisis” in social psychology. In the late 1960s and early 1970s there was much hand-wringing over failures of replication and the “fun and games” mentality among researchers; see, for example, Gergen’s 1973 article “Social psychology as history” in JPSP, 26, 309-320, and Ring’s (1967) JESP article, “Experimental social psychology: Some sober questions about some frivolous values.” It doesn’t appear that the field ever truly resolved those issues back when they were first raised–instead, we basically shrugged, said “oh well,” and went about with publishing by any means necessary.

    I’m glad to see the renewed scrutiny facing the field. And I agree with those who note that social psychology is not the only field confronting issues of replicability, p-hacking, and outright fraud. These problems don’t have easy solutions, but it seems blindingly obvious that transparency and open communication about the weaknesses in the field–and individual studies–is a necessary first step. Fiske’s strategy of circling the wagons and adhering to a business-as-usual model is both sad and alarming.

    I took early retirement for a number of reasons, but my growing disillusionment with my chosen field was certainly a primary one.

  32. Brad Stiritz says:

    Andrew, thanks for this brilliant piece of history / opinion. Just one minor item to perhaps reconsider:

    >Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up. The analogy’s not perfect, though, because there’s no one for her to sell her shares to.

    The analogy is quite apt IMHO. She wants people to believe the company (OPC = Old Paradigm Co) is worth say $100/share. You are like a short-seller going public with an argument that OPC is effectively bankrupt. In this case, OPC shares are actually worth pennies each. So you would be precisely correct: “there’s no one for her to sell her shares to”, because presumably no one is “buying” the old paradigm anymore.

    >What Fiske should really do is cut her losses, admit that she and her colleagues were making a lot of mistakes, and move on.

    Yes, exactly. As you say, perhaps she’s just too personally “invested” in OPC. Like they say, “Get married to a person, not to an idea or a stock position”

  33. joe says:

    Methodological terrorist is not the preferred nomenclature. Methodological freedom fighter, please.

  34. Jim Cox says:

    Scrutinizing the validity, reliability and generalizability of scientific work is an inherent part of the process. We shouldn’t give up our scientific objectivity just because someone’s feelings might be hurt. Conversations in pizzarias and taverns have never been peer-reviewed. The only thing different now is that casual conversations have a wider audience.

  35. Fernando says:


    I would add Langmuir’s (1953!) talk to your timeline

    Specially the section “Characteristic Symptoms of Pathological Science”. I quote:

    Symptoms of Pathological Science:

    1. The maximum effect that is observed is produced by a causative agent of barely detectable intensity, and the magnitude of the effect is substantially independent of the intensity of the cause.
    2. The effect is of a magnitude that remains close to the limit of detectability; or, many measurements are necessary because of the very low statistical significance of the results.
    3. Claims of great accuracy.
    4. Fantastic theories contrary to experience.
    5. Criticisms are met by ad hoc excuses thought up on the spur of the moment.
    6. Ratio of supporters to critics rises up to somewhere near 50% and then falls gradually to oblivion.

    I am not as optimistic as you appear to be that the current flood will change everything. I grant you that technology does make this new crisis different, and change more likely. But hardly a forgone conclusions. This is how I put it during a recent interview:

    Q: When did your interests in academia transition into entrepreneurial endeavors?

    A: When I realized that being an academic is not necessary for being a scientist. Modern academia is a little bit like the pre-Reformation Catholic Church. Cut off from society through unnecessarily complex language, and too preoccupied with its own internal affairs, and individual career progression. Not surprisingly, most published research is false.

    So I decided to abandon the Church, and embrace the Reformation. As with the printing presses then, so it is with technology now. New semantics, ubiquitous software, cheap computing, and innovations like Open Source licensing and crowdfunding will enable new scientific practices that are more accessible *and* reliable.

  36. There’s also a political/ideological dimension to social psychology’s methodological problems.

    For decades, social psych advocated a particular kind of progressive, liberal, blank-slate ideology. Any new results that seemed to support this ideology were published eagerly and celebrated publicly, regardless of their empirical merit. Any results that challenged it (e.g. by showing the stability or heritability of individual differences in intelligence or personality) were rejected as ‘genetic determinism’, ‘biological reductionism’, or ‘reactionary sociobiology’.

    For decades, social psychologists were trained, hired, promoted, and tenured based on two main criteria: (1) flashy, counter-intuitive results published in certain key journals whose editors and reviewers had a poor understanding of statistical pitfalls, (2) adherence to the politically correct ideology that favored certain kinds of results consistent with a blank-slate, situationist theory of human nature, and derogation of any alternative models of human nature (see Steven Pinker’s book ‘The blank slate’).

    Meanwhile, less glamorous areas of psychology such as personality, evolutionary, and developmental psychology, intelligence research, and behavior genetics were trundling along making solid cumulative progress, often with hugely greater statistical power and replicability (e.g. many current behavior genetics studies involve tens of thousands of twin pairs across several countries). But do a search for academic positions in the APS job ads for these areas, and you’ll see that they’re not a viable career path, because most psych departments still favor the kind of vivid but unreplicable results found in social psych and cognitive neuroscience.

    So, we’re in a situation where the ideologically-driven, methodologically irresponsible field of social psychology has collapsed like a house of cards … but nobody’s changed their hiring, promotion, or tenure priorities in response. It’s still fairly easy to make a good living doing bad social psychology. It’s still very hard to make a living doing good personality, intelligence, behavior genetic, or evolutionary psychology research.

    • Carl Shulman says:

      Geoffrey Miller:

      “Meanwhile, less glamorous areas of psychology such as personality, evolutionary, and developmental psychology, intelligence research, and behavior genetics were trundling along making solid cumulative progress, often with hugely greater statistical power and replicability (e.g. many current behavior genetics studies involve tens of thousands of twin pairs across several countries).”

      My impression is that evolutionary psychology has not been a beacon of statistical power and replicability, e.g. the ovulation-and-voting and Kanazawa studies Andrew mentions above are evolutionary psychology. Also, while family studies have been large, there was an era (still ongoing, although it is now being displaced by the high quality work you mention) of underpowered and almost wholly spurious candidate gene studies in the genetics of human behavior.

      • Maz says:

        I agree that lots of shaky stuff gets published in evolutionary psychology. Maybe developmental psychology, too. As for candidate genes, it’s true that that paradigm failed, but it failed because it was found out that study results could not be replicated. The fields involved responded to this fairly rapidly by increasing sample sizes and performing GWAS’s with stringent significance thresholds, and the results have been good.

        I would definitely agree with Miller’s larger point that the fields of psychology that have contributed most to our understanding of human behavior are underfunded and underappreciated compared to the flashy fields like social psychology that have contributed little.

    • Steve Sailer says:

      IQ testing is of course quite replicable.

      And yet, I’m struck by how little follow-up there has been among mainstream left-of-center social scientists regarding perhaps the biggest, most unexpected pro-Blank Slate social science empirical discovery of the late 20th Century: political scientist James Flynn’s uncovering of the Flynn Effect of broadly rising raw test scores on IQ tests around the world.

      This remains an important and fascinating topic, yet I’m not aware of much momentum toward exploring it, much less explaining it.

      • We’ve also had broadly rising Health and Nutrition and Education in the past 100 years, so first-pass it seems like not much explanation is really needed. Second pass, it’d be interesting to see how those factors correlate in time-series especially across countries where improvements in health and welfare may have come at different times..

  37. Justin says:


    Thank you for such an outstanding and detailed post. This is precisely why your blog is required reading for me.

    If you ever revise the timeline, one article you might consider adding is:

    Franco, Annie, Neil Malhotra,and Gabor Simonovits. (2014) Publication bias in the social sciences: Unlocking the file drawer. Science. 345:1502-05.

    They document the extent of publication bias in social science experiments from TESS.

  38. You’re doing god’s work here. I hope you’re feeling ambitious about pushing hard to change the field.

    I would guess that the field won’t end up changing very much without deliberate, strategic efforts to change it.

  39. Shravan says:

    One key problem that prevents communication between statisticians and social psychologists is that the latter don’t know anything (or know hardly anything) about statistics. They just use it, the equivalent of pushing buttons in SPSS. How can one get the criticism across in such circumstances?

    In psycholinguistics the situation is similar. In most cases, the sole purpose of fitting a statistical model to data is to confirm something they already believe. If the data don’t provide the evidence they want, it will be made to. Fiske, Cuddy, Tol, etc. suffer from the same problem. They will, magically, never publish evidence against their own theories. Any idea they ever had will only find confirmation in data.

  40. Jacobian says:

    I think that very few people are comfortable with the idea that we may actually just know jack in social psychology. Least of all tenured social psychology professors. There is no law guaranteeing humanity an encyclopedia of reliable knowledge on a subject just because we have been studying this subject for seven decades. Science is hard. Perhaps generating any reliable knowledge in social psychology requires a truly extraordinary effort: exquisitely designed studies with thousands of participants and the statistical analysis planned out months in advance by expert statisticians. Perhaps instead of 1000 social psych professors, we must spread the funds among only 100 to get any results worthy of the name.

    If that is so, then not only are most past results suspect, but people committed to the old style of research (a sample of 40 psych undergrads and 10 hypotheses tested at p=.05) are guaranteed not to discover any real results for the entire rest of their careers. It’s hard to admit a mistake when you feel there may be nothing left to rely on. I’m not excusing Fiske et al., just suggesting that we shouldn’t be so surprised at their recalcitrance. Their careers past and future may be at stake.

    • Steve Sailer says:

      One possibility is that much of what social psychology attempts to study (e.g., priming) might be fundamentally ephemeral, and thus, even if the field were to massively improve its methodologies, often unlikely to replicate reliably.

      Human beings, and especially college students (the main source of subjects in Psych Dept. studies), are prone to taking up behavioral fads and then dropping them. Moreover, some people are better at eliciting faddish behaviors (e.g., rock stars) than are others.

      By way of analogy, the more I’ve looked into the Flynn Effect of rising raw scores on IQ tests (a generally much more replicable part of psychology), the more I’ve realized the past is a different country. Why were raw IQ scores lower in the past?

      I dunno. I’ve got lots of theories but not many ways to test them.

    • Nick says:

      I regularly make your suggestion (5 or 10 times bigger samples, 5 or 10 times fewer labs and, hence, PIs) to psychologists. I’m surprised how favourably most of them respond. Maybe they all imagine that they will be among the survivors.

  41. dmk38 says:

    In the end, the issues here will be resolved not by what people say but by what they do.

    For my part, I’m confident that they’ll only do more and more of the sorts of things that come under the broad umbrella of “post-publication peer review,” including blogs. Indeed, I’m confident that traditional “peer review” will become less & less central to the filtering process that empiricists, in all fields, rely on to identify the best work. I’m confident of that b/c the value of what Andrew & others are *doing* is so obvious.

    There is tremendous value in saying why these are are important things to do. The value, though, lies not in the contribution such words make to winning a debate w/ Fiske or anyone else (the contest is so lopsided that calling attention to that is not worth the time) but in the role they play in forging a shared understanding among those who are acting in concert to create a better empirical scholarly culture.

    So if you agree with Andrew for sure say so.

    But then go out & do what he is doing. And do it & do it & do it some more. In the competition of doing, who is right will be clear to anyone who is using reason to judge.

  42. Dr. Dwayne Elizondo says:

    In case you haven’t seen it, Kahneman realized something was off with priming and had stern words for the field:!/suppinfoFile/Kahneman%20Letter.pdf.

  43. Keith O'Rourke says:

    Just to stir things up (as if this is needed).

    In the Statistical Society of Canada’s Code of Ethical Statistical Practice are these two statements.

    1. While question and debate are encouraged, criticism should be directed toward procedures rather than persons.
    2. Avoid publicly casting doubt on the professional competence of others.

    The first, I believe, almost everyone would agree with.

    The second, seems perhaps problematic, blogs are public and valid criticisms of someone’s work and position would imply lack of competence?

    • Eric Rasmusen says:

      “criticism should be directed toward procedures rather than persons.” I think I would disagree with this as a general principle. Sometimes it is very socially useful to point out that someone is a charlatan. Mistakes in one article aren’t enough to establish this, but pointing out a pattern in someone’s work is fair game. In fact, it needs to be heavily encouraged, since we academics are nice and timid people who usually aren’t willing to say the emperor has no clothes.

      • Curious says:

        The question is whether this method of public shaming advances the goals of improving research better than one that focuses on procedures. Targeted attacks result in the opposition digging in, not rolling over. That is simply human nature. Assumptions otherwise are simply naive. And while it may bring a sense of fullfillment to a feeling of righteous indignation to shame someone for their foibles, it does not actually advance the cause of improvement.

        • Eric Loken says:

          Agreed that public shaming just to feel self-righteous is bad news, and unproductive. The problem though is that there is no uniform calibration of “value” of published work. When the housing market went south, all house prices were affected in a common way (not identically of course, but the same correction mechanism was in play). As we enter a recalibration of the value of published research, it has the appearance of playing out on a case-by-case basis and therefore seems targeted and arbitrary. It’s as if the housing market corrected by first dropping the value of a few houses in the neighbourhood at a time. Hey, what do you guys have against MY house? is the natural reaction. That it plays out this way is unfortunate, and Fiske is right that it can be damaging.

          But where is the value of published research established? Fiske wants to say it was established in peer review. The house had a sale price, it was inspected….that’s the value and don’t show the poor taste of challenging that. That’s an understandable response. But we now know that the peer-reviewed literature is far shakier than presumed. It’s over-valued, and we know this from the Meehl argument, the Ioannidis argument, failed replications, and unfortunately from what come off as a few drive-by shamings that show the weaknesses of individual projects. Trust the ratings agencies (peer-review) is not a great response considering where we have come and how we got here.

    • Alex says:

      Is there a way to question someone’s procedures without questioning their professional competence? I’m trying think of an example where you can say something like “this was done poorly” that couldn’t be taken as an indictment of the person’s competence. Does the Code give an example?

  44. Anonymous coward says:

    Do you know the Soviet definition of constructive criticism? It is criticism that does not involve tried senior executives (Solzhenitsyn, First Circle, ch. 78).

  45. Eric Rasmusen says:

    What I find surprising is the pride of some academics— their unwillingness to admit that one of their papers has a fatal flaw. It wouldn’t be surprising if that was their only publication, but why do these big-name people get so touchy? It’s like CEO’s who lie about which college they went to— they lie about things that would have trivial impact on their reputations, or whose admission might even enhance them. Prof. Gelman, for instance, admits he’s written papers which are worthless, and we don’t think the less of him. If we cut his vita in half, he’d still be considered a top scholar, but if he were caught defending the undefendable even once, he’d be a laughingstock. So he doesn’t (even aside from having good principles).

  46. Steve Morgan says:

    I am an infrequent commenter on your blog, but I do weigh in when it is important. This is important. I don’t now enough about social psychology to appreciate all of the history and infighting that is occurring, but it is absolutely clear that this replication debate has been an important episode. Even for methodologists. Lots of people have understood the problems created by data snooping and all-too convenient choices of covariates, and of course the silliness of .05 nhst. But, for those of us who don’t typically collect our own data, we have thought too little about the havoc created by self-serving choices of when to stop collecting data.

    On the Fiske piece, I think you have said what needed to be said. I very much appreciate your work on these issues.

  47. PjoombE says:

    “I’m looking at the river but thinking of the sea…”
    great analysis with equally great randy newman headers!

  48. Manoel Galdino says:

    About the timeline, maybe Ed Leamer’s paper should also be included?
    “Let’s Take the ‘Con’ Out of Econometrics,” by Ed Leamer. The American Economic Review, Vol. 73, Issue 1, (Mar. 1983), pp. 31-43.

  49. Michael J. says:

    I’m not sure it’s significant enough to belong on your timeline, but an article which was passed around in my social circles in early 2014 and which brought the issue to my attention was Slate Star Codex’s The Control Group is Out of Control, mostly on the Bem article mentioned:

  50. Mark Fichman says:

    This has been an excellent discussion. The piece of the history that is missing from Andrew Gelman’s excellent presentation is meta analysis. The recognition of the inherent noise in individual studies that was identified tellingly by meta analysts in many fields including psychology was where I learned to suspect individual studies. The version I first was exposed to that nailed the issues being discussed here was in the first chapter of Schmidt and Hunter’s `Methods of Meta Analysis’ which demonstrated `completely convincingly’ how multiple low powered studies would generate precisely the pattern of research findings and conflicts found in applied psychology, particularly personnel psychology. This can be seen in the first chapter of their book. It is a perfect forecast of where social and applied psychology has evolved, even with no errors and no malfeasance. It is a must read. Here is the exact cite:

    Methods of Meta-Analysis: Correcting Error and Bias in Research Findings 3rd Edition
    by Frank L. Schmidt (Author), John E. Hunter (Author)

    They have used the same first chapter in all their editions. It captures much of the problem in 10 pages. I use it as a demonstration in classes. Always works.

    • Keith O'Rourke says:


      I really doubt if meta-analysis should be in this history which I believe is about how non-replication became understood as being widespread and big problem amongst a large percent of those in any given discipline and even wider public.

      Now, I have written something on the history of meta-analysis in clinical research and though meta-analysis may vary by discipline, I have met and talked with many of the early writers of 1980s. I don’t remember any saying its great how most are getting what meta-analysis’ purpose really is (not just increase precision by pooling) and what it can do for disciplines (encourage and maintain quality research efforts and reporting) but usually the opposite – why don’t they get why it is important. And the last time I talked to Nan Laird she said something like meta-analysis topics are difficult to publish – not much of a market for it.

      So, a subset in a discipline do learn about meta-analysis and a smaller subset will actually do it. Of the subset doing it, some will do it well and notice lots of problems with published papers and the inexplicable variation of their findings.

      But for most in the discipline it might just suggest that weighted averages of estimates will fix everything just fine :-(

      Also for psychology this might have been true and still be?
      “These [clinical meta-analysis] publications tended to emphasize the importance of assessing the quality of the studies being considered for meta-analysis to a greater extent than the early work in social sciences had done. They also emphasized the importance of the overall scientific process (or epidemiology) involved.”

  51. Schenck says:

    ” but also through responding to constructive adversaries”

    Perhaps it’s because I’m reading Mayo’s Error & Grwoth, but this sounds very much in keeping what she (or rather she see’s Kuhn) as describing with non-sciences. As in Astrology having conflicting schools within it, all able to deeply criticize each other, largely on the fundamentals, but no way to learn from error and criticism. The top down way of doing things can argue and respond to the replication crisis, but it can’t learn from it and adapt. And some of it’s results, like the spurious papers, read like astrology readings: believable; vague-but-somehow-personal.

  52. Frederic Bush says:

    I think you and Fiske are talking past one another here. Your methodological critique is very strong, but you basically dismiss out of hand the stronger parts of her argument: her claims that people are getting harassed in totally unacceptable ways, and that unfiltered comments can promote this harassment and in general pollute the internet. Why do you find this unlikely? You don’t have to look far to find websites whose comments are a cesspool (there is a reason why more and more sites are getting rid of them), and there are lots of people who have been forced off the internet and sometimes into hiding due to harassment and death threats from internet hordes. (I must say, so far the commentary I see here is unusually civil and informative, so perhaps you have your own small, biased sample to judge from.)

    Your argument, that this is about ethics in methodology, is true for you and other prominent bloggers and researchers, but that doesn’t mean that trolls aren’t using it as an excuse to harass people.

    • Andrew says:


      I’m as bothered by anyone by trolls etc., and I’d’ve had no problem if Fiske had written an article about trolls, abuse of communication channels, etc. (ideally with some examples). But this has nothing to do with replication. These are two unrelated topics! What Fiske seems to be doing is conflating the replication movement, which she doesn’t like, with all sorts of “terroristic” behaviors which none of us like. I’d prefer for Fiske to write two articles, then I could say I agree with her article about bad behavior and I disagree with her argument about scientific criticism.

      • Anonymouse says:

        Andrew: It’s worth noting that her article only mentions replication once, and favorably.

      • Anonymouse (2) says:

        Andrew: I would also add that, there is a risk here of what I believe you referred to as second-order availability bias — either on your part, Fiske’s, or both.

        You argue that Fiske inappropriately conflates trolls with people who consider themselves ‘data cops.’* However, there may be good reason for her to do so. If methodological rigor is independent of trolling then we would expect a similar proportion of trolls in the methodological rigor camp as in the general population. If we further intersect that with people who post online, we should expect that proportion to be larger still.

        From your perspective, people who are concerned with methodology are constructive critics and not adversarial with regards to the science. Indeed, it is likely that you surround yourself with like-minded, constructive critics. You may avoid the trolls deliberately (I would) or at least, not seek them out. Moreover, they have little reason to seek you out. Intuitively, it would be easier to underestimate the correlation (thinking, perhaps, that methodological critics are less trollish than the population at large). In contrast, we have good reason to expect that Fiske is not surrounded by people like you (otherwise, we might expect that she could have been guided to better research practices earlier in her career or have had some gentle guidance now). The methodological critics who do approach her, however, are more likely to be trolls than the population average. So people in Fiske’s camp will be likely to see a correlation between critics and trolling; whereas constructive critics are less likely to see a correlation (or are more likely to see a negative correlation).

        Another commenter made an argument about this being an ‘ad feminam’ — which I disagree with completely. On the other hand, I believe there is, perhaps, an appropriate connection to be made to the struggles that women often face. Specifically, people who complain about a problem are often viewed as doing it to stir up attention. (Some) well-meaning men in the workplace doubt that sexism exists because they and their friends personally aren’t sexist. They look at the evidence around them and it just doesn’t hold up. But they don’t realize that their sample is biased. Well-meaning methodological critics may believe that trolling doesn’t exist because they likewise do not see it in their circles. That said, I don’t think we should be so quick to dismiss it when people suggest that it might be a problem.

        *[You said ‘replication’, but, as noted previously, Fiske never explicitly makes the connection. Perhaps the juxtaposition is enough, but I’ll just report what she stated here.]

    • Simon Byrne says:

      While I agree that abuse and trolling is unacceptable, Fiske takes a much stronger stand in her letter, wanting the only public discourse to be through “monitored channels”, with presumably with her and her peers doing the monitoring.

      • jrkrideau says:

        I have followed the citation trails, and they don’t end anywhere near what they are used to claim

        One of the things I have learned from reading secondary sources on historical cooking is that you should never trust a secondary source that does not include the primary, since you have no way of knowing what liberties the author may have taken in his “interpretation” of the recipe. David Friedman

        It is not clear to me that some researchers ever look at the original paper they cite, or at most, may have read (a potentially misleading) abstract.

    • Ed Snack says:

      Actually, I think that is an urban myth. there are a small number of sites perhaps that can be involved with such harassment, but generally the harassment is almost entirely the other way. The establishment (Fiske being a prime example) use their position to harass in unacceptable ways those who wish to criticize obvious methodological and data flaws.

      You want personal ad hom cesspools, try the pro-social research and politically committed websites. Example, try the foul personal attacks in the Climate Change supporting sites like “Skeptical Science” or “Realclimate” or “Tamino”, and the numerous and often successful attempts to have those who publish against the “scientific consensus” removed from positions and have their research unable to be published. The establishment is by far the largest source of such harassment, and by far the more successful at it.

      That isn’t her strongest argument, it’s her strongest method of getting her own way.

  53. Frederic Bush says:

    It really does seem to me to be an article about trolls, abuse of communication channels, etc. There are two cheap shots at “methodological terrorism” and “data police” but that is it for her engagement with the theoretical issue as far as I can tell.

    • Lee Jussim says:

      Andrew, this is a masterpiece. I am teaching an advanced undergrad course, The Psychology of Scientific Integrity, and had the class come to a screeching halt in order to read, consider, and critically evaluate both Fiske’s essay and your response here. Thank you. I recently posted the course syllabus here:

      Frederic: One of the many exasperating (to me) aspects of Fiske’s essay is that she does not provide a shred of evidence, not a single example, of any of the things she is complaining about. I realize that does not mean they do not exist, maybe there are zillions, and my not knowing what she is writing about reflects my own ignorance.

      You wrote that the internet can be abused. Of course it can. But her essay was not a critique of internet blogs writ large. It was focused on those critical of psychological research. Which leads to my questions for you (or anyone else):

      1. Can you point to a single blog by any psychological researcher, or even any blog by any scientist with a phd and at least a couple of publications, that has constituted a personal or ad hominem attack, being a “methodological terrorist,” a “destructo-critic,” used “smear tactics,” has had a “chilling effect on discourse,” or warrants being called “vicious”?

      2. Do you (or anyone else) know of a single person who left the field because they were (or felt) harassed by bloggers? Or who chose not to come up for tenure?

      Nearly all of Andrew’s points resonated with me, but this one is particularly relevant here and worth repeating:
      “Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious ork but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.”

      IDK how many people this describes. However, I strongly suspect that it describes far more than those driven from the field by critical bloggers.

      • Shravan says:

        Lee, if I were Amy Cuddy, I would definitely be feeling personally attacked, especially if she was denied tenure. I got the feeling that Fiske’s article was about her former student being denied tenure. It would have been good if she had been more specific, I guess.

        • Lee Jussim says:


          1. I agree that that is probably some of what Fiske is responding to.
          2. The fact that someone “feels” attacked, however, to me, provides lots of evidence of their subjective experience, and none about whether they were actually attacked.
          3. I think many of our colleagues have an extraordinary tendency to feel extremely defensive when their pet phenomena or studies are appropriately and scientifically criticized, so “feelings” of being attacked do not mean much to me.
          4.To me:
          Personal attacks are: “Smith is an ________” (fill in pejorative of choice: asshole, fraud, charlatan, incompetent).
          Scientific criticisms are: “Smith (2015) reported that hawks weigh more than geese, which does not follow from Smith’s data showing no difference” or “Smith (2015) reported that ducks weigh more than geese, but mis-coded his species; in fact, Smith’s data show that geese weigh more than ducks.”

          If Smith gets defensive about this, feels “personally attacked,” and decides to leave the field, that is Smith’s problem; it is not a problem with the critiquer.

          So, my question stands: Can you (or anyone reading this) produce one or more specific cases in the blogosphere, NOT where someone criticized “felt” attacked (yes, there are ample cases of people “feeling” attacked), but where any bona fide scientist published a personal attack (as defined above) on some other researcher? Thanks.


          • Carol says:

            From THE ODYSSEY ONLINE, March 21, 2016: “She [Cuddy] mentions in the last few minutes, while being interviewed by Will Cuddy (same last name but no relation) about her recent opportunity to accept a position with tenure at Harvard Business School. She ended up turning down the offer due to her desire and passion to reach broader audiences.”


          • Carol says:

            Hi Lee,

            The blog article by Dan Graur, who posts as “Judge Starling” and is a PhD biologist at the University of Houston, on Barbara Fredrickson’s 2.9013 positivity ratio research, could be considered an example of a personal attack. See


            The “Ma’am” was originally “Bitch.”

            • herr doktor bimler says:

              Fredrickson’s career is an interesting phenomenon. The 2.9013-positivity-ratio paper was fraudulent… Losada made it all up, Fredrickson went along with it and published a high-profile book promoting the fraudulence. Then there was her PNAS paper, “A functional genomic perspective on human well-being”, which is undiluted statistical incompetence and over-fitting. And the vagal-nerve upward-spiral stuff.
              Yet she remains paramount in the positive-psychology field.

              • Carol says:

                Hi Herr Doktor Bimler,

                Barbara Fredrickson has some very powerful people (e.g., Martin Seligman, former President of the American Psychological Association) who both support her *and* denigrate (sometimes openly, sometimes not) to the positive psychology audience the efforts of people like Nick Brown, who has been the point man for several of the critiques of her work. I applaud Nick for his courage and persistence.

            • Carol says:

              Hi Lee,

              Also, on the Facebook group PsychMeth, Ulrich (Uli) Schimmack, in a critique of an article by Wolfgang Stroebe entitled
              “Are Most Published Social Psychology Findings False?” wrote “So, Stroebe is not only a liar, he is a stupid liar, who doesn’t see ….”

              • Lee Jussim says:

                Carol et al.,

                One of my grad students just showed me the Psych Methods Facebook discussion of Fiske’s essay. Although I still think Fiske is about 95% wrong, and Andrew almost 100% right, I find myself compelled to acknowledge *some* validity to Fiske’s essay. The discussion there of controversial topics, such as the Fiske essay, gave me a general feeling of “Eeeeeewwwww.” I am not saying everything there is angry or harsh, it is not. There is lots of good stuff there. Very few posts are smoking guns of Unvarnished Evil. But the tone regarding Fiske’s essay came across to me as very unpleasant (both the pro and anti sides were this weird mix of calm, reasonable points, and stuff that I cringed to).

                I did not see or try to track down the Schimmack post, but nor do I find it hard to believe (even though I admire Schimmack’s scholarly work and his R Index blog). One line posts lend themselves to harsh zingers.

                It felt like a rabbit hole I did not want to go down and, when I did, just a little bit, I felt like I was in a rancid swamp. Now, lots of great wildlife lives in swamps, I just do not want to spend much of my time there, even if I occasionally want to check it out from a distance.

                Bottom line: So this search for evidence has changed my view, a little bit. Even though my view of Andrew’s essay as masterful remains unchanged, I no longer think Fiske is 100% wrong. I now think more like 90% wrong. Her invective and insults are still wrong. Her allusions without a single example are problematic. Her piece remains an exquisite example of the old ways of doing psychological science crumbling. I have yet to find a single psych blogger whose work is problematic.

                But I take the Facebook Psych Methods group posts as likely just one example of other dysfunctional communications. I would now guess that Facebook and Twitter lend themselves to exchanges that are snarky, or worse. The 10% valid point includes those and similar venues.


              • Ulrich Schimmack says:

                Dear Carol,

                You quote a conclusion that is based on an extensive substantive review of the article.

                Stroebe makes two claims.

                1. The typical study in psychology has modest power (~ 50%).
                2. there is no evidence that publication bias contributes to the replication crisis.

                These two claims are inconsistent. How can we explain 95% success rates in published articles with 50% power?

                You may disagree with my tone, but substantively the claim that there is no evidence for the presence of publication bias is untrue (see e.g., Sterling et al., 1995). Moreover, Stroebe has repeatedly made similar statements that suggest we do not have a replication crisis when the success rate of replication studies in social psychology was only 25% in the OSC replication project. I think it is important to put my quote in the proper context as a conclusion about statements that are internal inconsistent and inconsistent with empirical evidence of publication bias.

                Sincerely, Ulrich Schimmack

            • Lee Jussim says:


              Thanks for pointing to that Judge Starling blog. It definitely meets my criteria for insulting, inappropriate and the tone is consistent with Fiske’s objections.

              However, I am about 95% sure that it is irrelevant to Fiske’s essay for several reasons:
              1. Biologists’ blogs do not usually have much traction in most of psychology; I would be very surprised even Fiske had even heard of this guy.

              2. Fiske’s essay is to be published in the flagship magazine of the Association for Psychological Science. As such, her audience is psychologists, not biologists, physicists, or historians. She refers to social media and blogs without the qualifier “psychology,” but I have little doubt that her main, and perhaps exclusive, targets are psychological blogs and perhaps some generalist/popular blogs such as Neuroskeptic.

              3. I doubt that a single whacked diatribe would inspire an essay such as Fiske’s. She repeatedly uses plural, as in “methodological terroristS” and “destructo-criticS.” Fiske is an excellent writer. I doubt the plural was used carelessly or metaphorically.

              4. All of which leads me to rephrase my question as a challenge, albeit a more narrow one:
              Can you, or anyone else out there, identify a single psychology blog, or popular generalist blog (such as Neuroskeptic, or Ed Yong) that constituted insulting, ad hominem, personal attacks?

              I regularly read the following: Srivastava’s Hardest Science, Vazire’s Sometimes I’m Wrong, Daniel Lakens, PIG-IE (Brent Roberts), Funderstorms (David Funder), and Schimmack’s blog on replication and his Rindex and Incredibility Index. Schimmack is the most pungent of the bunch, e.g., referring to phacking as “doping” but even that stops (in my view) WAY short of insults, or personal ad hominem attacks. And I have never seen any of the rest of the crew come even close.

              Now, I have to admit I do not frequent the Twitter-verse or Facebook world very much. Perhaps it is harsher and more snide out there. But most of the harshest stuff I have seen has not come from the critics, it is the growing pantheon of insulting terms being thrown about by the Defenders of Business As Usual (shameless little bullies, methodological terrorists, vicious ad hominem attackers, etc.).

              So, can you point me to any psychology or widely read generalist blogs that are comparably insulting? Can anyone? As usual, I am open to exposing my ignorance, if someone out there can show me the evidence.


              • Carol says:

                Hi Lee,

                I’ll have to defer to the other commenters. I rarely read any blog other than Andrew’s and once in a while PubPeer, so I really don’t know.

                I do recollect that on the APA Friends of Positive Psychology list serve, Martin Seligman characterized the critique of Fredrickson et al.’s (2013) well-being and gene expression PNAS article written by Nick Brown and his colleagues as a “hatchet job” when he (Seligman) could not possibly have read and understood it. Nick’s reply was a marvel; he was extremely polite but did not give an inch. Nick can provide more details.

              • Carol says:

                Continued, Lee: But Martin Seligman would be on Susan Fiske’s side, of course. My guess is that Susan Fiske has greatly overstated her claims about online vigilantes attacking individuals, research programs, and careers. There are undoubtedly a few such persons, but my guess is not many. I think that the editor of THE OBSERVER should insist that she either provide evidence or remove these accusations. Perhaps that is your goal.

              • Carol says:

                Lee, I’d really like to know what “senior faculty have retired early because of methodological terrorism.”
                Presumably Fiske is talking about full professors in social psychology or related areas.

                Few academic psychologists retire early for any reason because of the requirements of their pension plans, so this is hard to believe. It seems to me that senior social psychologists (Bargh, Baumeister, Schwarz, Fredrickson, Gilbert, etc.) have been standing up for themselves when their work has come under criticism, not throwing in the towel.

              • Carol says:

                Hi Uli and Lee,

                I said nothing about the quality of Wolfgang Stroebe’s article or your critique; indeed, I have not read them. I was simply responding to Lee Jussim’s request for examples of blogs or other social media in which a bona fide scientist made a personal attack such as “Smith is an asshole, fraud, charlatan, incompetent.” You wrote that Strobe was “a stupid liar.” I have made no judgment — public or private — about the veracity of your description of Stroebe; indeed, I had never heard of him before.


          • Shravan says:

            Well, it’s pretty clear by now that in matters relating to statistics and statistical inference, many of these senior and not so senior scientists *are* incompetent. That’s entailed by Andrew’s criticisms. Whether one says it in so many words or not is kind of moot. Many of them have replied directly to Andrew on this blog, and it was pretty clear they have no idea what the issues are. Nobody is born knowing stuff; they can find out what’s missing in their understanding if they choose to. And yet they hunker down and focus only on the mocking they are subjected to.

            It’s a pity that Fiske and the like do not have the courage to preface their remarks by stating that, yes, there are many problems in their work, and yes, they don’t (yet) understand the statistical issues too well, but they will try to fix this in future work. I just saw that the note she wrote is not yet published; maybe she will publish a revised version after having been subjected to this vigorous non-peer review by Andrew and others on the blogosphere. If she revises it, it would be a very interesting situation indeed.

        • Carol says:

          Hi Shravan,

          We looked into this on one of Andrew’s other postings. We learned that Amy Cuddy was offered tenure at Harvard Business School, but declined it.

          It is certainly the case that Cuddy’s work has been criticized but in my opinion at least, that criticism was justified.

      • Daniel Bradford says:

        And thank you, Lee, for your work combating these issues. I an advanced PhD student in clinical psychology. I have been seriously considering designing a course like yours and appreciate you sharing the syllabus.

        • Lee Jussim says:

          You are more than welcome. My course is an undergrad course; and you have to assume they have no idea about any of this. They have had a hard enough time understanding the logic of experimental methods, what p<.05 means, etc., to turn around and say, basically, "Everything you have learned so far is One Great Mess," is not the easiest thing for them.

          So, the news and magazine articles, though probably below the level of most of Gelman's denizens, are, I think, crucial to this sort of course, because they are written in a style accessible to the reasonably intelligent layperson — i.e., your basic senior majoring in psychology.

          And some are just really really good on their merits. If you find something else that you end up including in a course, that you think would be terrific for undergrads, would you let me know? jussim at rci dot rutgers dot edu. Thanks.

    • Martha (Smith) says:

      This looks like a good first effort, but could use some more thought — in particular, in the section “We need to do what we can to minimize the negative aspects of the climate that lead to name calling, personal attacks, and intimidation, while promoting and encouraging the positive aspects of the climate that lead to skeptical and critical discourse, productive discussions and debates, and a better, more self-correcting science.”

      Part of the problem is that phrases such as name calling, personal attacks, and intimidation can mean different things to different people, so it’s important to give specifics or examples to clarify what is meant.

      • Bill Harris says:

        Shravan, take a look at some of Chris Argyris’ work on “action science” and the “model II” communications model. I’ve found it quite effective in communicating effectively in the presence of conflict; YMMV.

        • Shravan says:

          This is interesting, I will read up on this approach, Bill.

          On the broader issue, I wonder if Andrew will consider a shift in tone on his blog; if so, Fiske’s (very reasonable) criticism about tone will be addressed, laying bare the problems on the statistical side of things. One can certainly communicate criticisms in more than one way (as Bill just points out), and I am curious to see if Andrew will change style. I think that a lot of people are alienated by Andrew’s style, and these people would be on board otherwise. I am reminded of Andrew’s ethics article in Chance in which he aggressively attacked a statistician (who, Andrew dismissively said, only had an MS in Statistics) for doing what Andrew felt was an incorrect analysis and for not releasing data. The statistician’s response (which was also published in the same magazine) was excellent, and in the end I felt Andrew came out of that fiasco looking a bit shabby. You can revisit the story here:


          House (the statistician):

          I hope Andrew learnt something from this exchange. His criticisms could be a lot more effective if couched in a more guarded or nuanced language and without all the personal attacks on competence.

          Of course, Meehl and Cohen had made similar observations as Andrew and others do today, but they communicated through the peer-review system the way Fiske wants it. This had, it seems to me, zero impact on psychology. One could therefore make a case for trash-talkin’ Andrew kicking up some dust as a way to get things moving. What about Brian Nosek and EJ Wagenmakers, who are taking the problems on by writing articles about them? I guess they are doing what Fiske would like to see more of.

          I have also wondered if it is fair that Amy Cuddy fails to get tenure (if that is what happened, this seems to be an implication in Fiske’s text, or am I reading too much into it?) when the people who taught her to do what she does go scot free. Who is ultimately to blame for Cuddy’s overselling and statistical mistakes? She shares some of the blame, but not all of it. The problem usually is that one cannot even see what the problem is (“the don’t know what you don’t know” problem), and someone in authority told them what the right way to do things was and what the goal was of becoming a scientist (high h-index, lots of articles, having a media link on your home page).

    • Lee Jussim says:


      This turned out much longer than I planned, so here is an abstract:
      Part I: My Reservations About that Petition
      Part II: Advantages of the Blogosphere Over Peer Reviewed Journals

      I sent this email to one of the organizers of that petition a few days ago. Here is what I wrote:

      i saw your petition, and am deeply ambivalent about signing on, and perhaps you can talk me down a bit here….
      1. There is a term common in the rightwing blogosphere and political editorials, “crybully.” It refers to people who try to stigmatize, shame, or shut down others on grounds of “look how badly you hurt me (or someone else).”
      The petition is fine as far as it goes, but one person’s legitimate criticism is another’s personal attack. In the absence of a definition/description/distinction between the two, the petition can be used as justification for Fiske-ean views.

      I find that troubling and wonder what you think about it.

      2. I would say that several of those who have signed on already fit my description above. That deeply disturbs me because I see it as increasing the risk of legitimizing attempts to shut down and shut up the (incredibly constructive) critical movement within psychology.

      IDK what to do about that, and, again, wonder what you think.

      3. There are some strong statements in the petition, including these:
      “However, science suffers when attacks become personal or individuals are targeted to the point of harassment.”
      “We need to do what we can to minimize the negative aspects of the climate that lead to name calling, personal attacks, and intimidation.”
      “Also damaging to our scientific discourse are harassment and intimidation that happen through less visible channels. When people with a lot of influence use that influence to silence others (e.g., by using their leverage to apply pressure on the target or on third parties), and especially when they do so through nontransparent channels of communication, this harms our field.”

      I apparently do not get out enough. What is this talking about? Who has been personally attacked, targeted, harassed, called names, intimidated? What in tarnation is the “back channel” thing about? I really have no idea at all, and am wondering what you think of this…

      The person I wrote to responded first with this:
      “Those are all valid concerns and ones that I share.”
      After spending a paragraph explaining why s/he signed it despite his/her own ambivalence, the response ended with this:

      “overall I agree that our field’s biggest problem is not incivility but ridiculous levels of deference to authority and fame/status.”

      Shravan, I have, so far, come up with NOTHING in psychology’s blogosphere that rises to the level of invective that:
      a. Fiske claims is out there
      b. Fiske used herself in her attempt to discredit the scientific integrity movement.

      Now, absence of evidence is not evidence of absence. Perhaps the sharpest exchanges, ones even I would consider insulting and inappropriate, occur on Twitter and Facebook. Twitter and Facebook, however interpersonally important they may be, are scientifically trivial. That is why they are called “social” media, not “scientific media.”

      Not so of the blogosphere. Some of the best work exposing dysfunctions and offering solutions is coming out of the blogosphere. Long form essays revealing bad stats, bad methods, and unjustified conclusions, and proposing or advocating for particular types of solutions are now an invaluable part of the field, and they are not going away any time soon.

      And, indeed, one can view the comments section of blogs as itself a new and emerging form of peer review, so it is, perhaps, not completely true to describe such contributions as circumventing peer review, although it is true that it is circumventing the traditional publication process.

      And one of the great advantages of the blogosphere, is that the info gets out VERY fast. I recently had a paper come out in JESP, one of social psych’s top journals, that you can think of revealing case after case after case validating Andrew’s point about how conclusions in social and cognitive psychology are often unhinged from methods and data. You can find that here, if you are interested:,%20JESP.pdf

      Thing is, that took YEARS to write, submit, revise, and get out. (not to mention the mild forms of hell we were put through, in which one of the reviewers accused us of engaging in “personal attacks,” a comment echoed by the editors; my collaborators did thought resubmit the paper was useless because getting it in was hopeless). A good blog can be written in a few days (or, apparently, for Andrew, in a day…).

      I also see no evidence that there are more ERRORS in the blogosphere than in peer reviewed journal articles. Actually, experientially, I would say there are FEWER. But whether the blogosphere or peer reviewed articles yield fewer errors is actually an empirical question, to which none of us, not me, you, Andrew, or Fiske, yet have an empirical answer. Still, I would say the bloggers I read, perhaps because they care about scientific integrity and are usually relatively statistically and methodologically sophisticated (not anyone can aspire to be a methodological terrorist!), have far fewer errors than do peer reviewed journals.


  54. Jon Frankis says:

    And yet … if the field under question were climate science and Fiske was making the same kind of complaints as above? – I’d find myself on the other side of the pleasantries sympathising with the published authors against (most of) the riff raff.

    Proud to be a skeptic of much published medical and social science, but even more skeptical of the critics of properly peer-reviewed climate science. In short, I think: modern times are complicated.

    • Martha (Smith) says:

      My understanding is that climate science publications are super-peer-reviewed — not just sent to an editor and then off to someone the editor decides, but examined carefully by “peers” in several areas of science, and revised carefully to answer any questions or lack of clarity.

      • Shravan says:

        You mean stuff like Richard Tol’s is carefully peer reviewed, right? ;)

        • Martha (Smith) says:

          I meant “science” in the strict sense, not in the loose sense — in particular, not including social “sciences” such as economics.;~)

          • The essence of science is that we submit our theories to validation by observation and experiment. So let’s call it “physical sciences” rather than “science in the strict sense”. I think it’s perfectly possible to do good *real* science studying social phenomena even if perhaps much social science falls short in practice.

            • Martha (Smith) says:

              I originally was going to say “physical sciences,” but didn’t want to exclude the biological sciences, since they are involved in climate science. I wasn’t trying to say that social science can never be “real” science — but was thinking of so many of the problematical social “science” that has been discussed on this blog.

              So to try again:

              I meant science in the strict sense, where theories are submitted to validation or refutation by careful observation, experiment, and reasoning, rather than in the loose sense practiced by those who have a title or affiliation involving the word “science”, but do not submit their theories to validation or refutation by careful observation, experimentation, and reasoning.

      • Ed Snack says:

        No, no they’re not. They are most often pal reviewed. Send to specifically friendly reviewers and obvious errors are ignored. Prime example, Mann et al 2008. It contains gross data errors (upside-down data to original gatherers interpretation), inclusion of data that was designated unusable and contaminated by the original author, and inclusion of data that was recommended to be excluded (Idso & Gray Bristlecones, and it should be mentioned that a more modern set of data on the same area, the Bristlecone study by Linah Ababneh is never used though it is both more modern, comprehensive, and better modelled – but it has the “wrong answers”). When those basic outright errors are corrected the paper’s conclusions are no longer supported by the data. And yet it has not been withdrawn nor properly corrected.

        The other major issue in Climate “Science” is one that this discussion should well recognize; the “Researcher Degrees of Freedom” and in particular the post collection screening of data before inclusion. Virtually all paleo-climate papers use data that has been specifically selected from a pool of possible data because it produces the required results, usually via some pseudo-scientific screening process. Critics of this process are not just ignored but actively and personally vilified in exactly teh way that Fiske here complains about.

        I suggest that you don’t just believe what you are told, but actively investigate to see whether there are exactly the same methodological issues in Climate Science as in Psychology before making such politically and socially acceptable anti-climate change criticism.

    • abner says:

      Is there something an enthusiastic layman could read that would make me certain that this kind of bad science knot isn’t possible in climate science? Even a quite long book would be fine, it’s worth the commitment.

    • AMac78 says:

      > [I’m] skeptical of the critics of properly peer-reviewed climate science

      So was I, eight years ago. Then I happened upon blog comments that claimed problems with Michael Mann’s just-published magnum opus on using tree-rings and other records to reconstruct the climate history of past centuries. Mann et al., Proxy-based reconstructions of hemispheric and global surface temperature variations over the past two millennia, PNAS 105:13252–13257, 2008.

      In line with Prof. Fiske’s essay, the complainers lacked credentials. Most were blog-based and pseudonymous. Often enough, they could be seen as contentious, sarcastic, and agenda-driven.

      Overlooked by the climate-science establishment: the skeptics’ claims of flaws were correct, and drove a stake in the heart of the paper’s methods and conclusions. Not only that: successive rounds of scrutiny uncovered additional fatal flaws. Not only that: prominent leaders of the climate-science establishment doubled down on Mann08’s methods and findings. And not only that: today, the paper remains highly-cited and unretracted.

      Some of this may sound depressingly familiar.

      To state the obvious, none of this means that any given consensus-of-experts claim on global warming is necessarily wrong. However, the field’s key opinion leaders and their followers have not lived up to the trust that policymakers and the public have placed in them, as scientists and as practitioners of the scientific method.

      • jrkrideau says:

        I could not agree with you more. Ed Wegman’s analysis showed how bad Mann’s paper was.


        • Sarcasm and irony are notoriously hard to express in blog comments.

          • AMac78 says:

            Daniel Lakeland: Yes, noted. ;-)

            If there’s interest in examining my claim about Mann08 — doubtful in jrkrideau’s case but possible in others — one can click on “AMac78”, which is a the link to my old blog. The early posts are fairly self-explanatory.

            Many of the specifics of Wegman’s analysis were beyond my competence, and as you allude, the circumstances of his investigation were irregular at best. In my opinion, the most compelling criticisms are on other grounds. For example, the statistical validity of one key reconstruction disappears when one proxy (out of dozens) is removed. That “Tiljander proxy” (a) is invalid and (b) was used by Mann08’s authors in an upside-down orientation to that stipulated by the original authors.

    • Tom G says:

      The essence of why Science has more Power than Superstition is only because of this: better prediction about the future.
      Go back to the 2001 UN models on Global Warming. None of them, NONE, had a world temperature path range that included reality.

      ALL the models were wrong. See:

      “It’s impossible, as a scientist, to look at this graph and not rage at the destruction of science that is being wreaked by the inability of climatologists to look us in the eye and say perhaps the three most important words in life: we were wrong.”

      We were wrong. Scientists who can’t say this, or won’t, aren’t scientists, when their predictions go bad.

      Patrick Michaels of Cato does say: “Global warming is indeed real, and human activity has been a contributor since 1975.”
      But it’s a lukewarming world, so no need to panic.

      The desire to induce panic Big Spending by Big Government will allow those who publish Panic studies to get publicity, but to be wrong, and induce more anti-science.

      On global warming, if the CO2 increase isn’t enough to justify more nuclear power (and its risks), then it’s not such a big problem.

      Economists suffer the same science prediction problems.
      Venezuela should show conclusively that “socialism” fails — yet many social scientists remain … socialists.

  55. abner says:

    Is there something an enthusiastic layman can read that will make me certain that climate science isn’t in this kind of scientific knot? A long meticulous book would be welcome, it’s worth the commitment.

  56. MichaelP says:

    In contrast to most other commentators, I believe your post is unfair and justifies Fiske’s points.
    I completely agree with your methodological remarks and discussion about the state of the field and the need for change in the way we do science. However, you blatantly make an example of Fiske and publicly attack her as a researcher. Although you admit your ignorance of the vast majority of her work, you question the validity of all her research, and make serious allegations that her work over the years is plagued by methodological errors and she is unwilling to admit them.
    The editorial by Prof. Fiske, in my view, is pointing to the fact that public shaming of researchers and discrediting their work is easy and hard to respond to, and this has serious effects on their work opportunities and reputation. I didn’t see in her editorial any disapproval of the changes which are needed in the way we do science. Your post, however, personally attacks her and implies that much of her work is worthless, with only one example of methodological issues in her work, and several editorial decisions she made (for which you have no way of knowing the reviewers’ opinions of these works). You deduce that her whole methodology is flawed over the years and that she is desperately trying to salvage the old methods in face of needed change. This is, in my opinion, illegitimate public shaming.
    You make the point that “Susan Fiske can respond” to your post blog, just as anyone else. However, as she points out, the sheer volume of posts and comments can make it impossible to respond to criticism in a serious way. Writing and responding takes much time, and doing it for many unfiltered comments can be so time-consuming as to make it impossible. Therefore, I agree with her that the peer-reviewed and quality-controlled forum of scientific commentary on papers are better, in this respect, and should be expanded as a way to express criticisms on published works. Perhaps there is a need for better methods to express criticism and allow anonymity in published scientific comments, but blog posts and twitter posts can definitely damage careers and can be used in harmful, unfair ways, where the people under attack are in a weak position to defend themselves.
    For the record, I am not related to Prof. Fiske in any way, I don’t particularly like the tone of her editorial, and I completely agree with Andrew’s concerns regarding current scientific practices and the need to change the way we do statistics and peer-reviews. I’m just disappointed that the post makes a personal example of Fiske and thus proves her worries, instead of just focusing on these issues.

    • Andrew says:


      You write, “public shaming of researchers and discrediting their work is easy and hard to respond to, and this has serious effects on their work opportunities and reputation.” What I did with Fiske was point out that she’d mistakes in her data analysis. That wasn’t so easy. It was only possible because someone had noticed the error and pointed it out to me. Was this hard to respond to? It depends on what you mean by “hard.” On one hand, it would’ve been very easy for Fiske to have responded, “Yes, we made mistakes in that paper, we were sloppy in our data analysis and indeed we realize that some of our conclusions were not supported by the data.” That’s easy. But it seems to be hard for people to admit that they’ve made a mistake, and particularly it seems to be hard for Fiske to admit that many of her views on research methods are statistically in error. Had she carefully read Meehl, she might have been able to come to that conclusion decades ago. But not many of us had carefully read Meehl—I know I hadn’t—and, indeed Fiske’s refusal to take the easy (and, I believe, scientifically correct) step of admitting error, that’s the subject of my post.

      Similarly, I criticized Fiske for publishing that himmicanes paper which after it came out was seriously criticized by several people on the internet (not just me) who are experts on statistical methods. Mistake #1 was accepting that paper, but, hey, we all make mistakes. Mistake #2 was not accepting the criticism. Similarly for the air rage paper and the power pose paper and all the rest.

      You write, “Although you admit your ignorance of the vast majority of her work, you question the validity of all her research, and make serious allegations that her work over the years is plagued by methodological errors and she is unwilling to admit them.” In that specific paper, yes, I question the validity of her research, but I didn’t question the validity of “all her research” or even most of it. Here’s what I actually wrote:

      I’m not saying that none of Fiske’s work would replicate or that most of it won’t replicate or even that a third of it won’t replicate. I have no idea; I’ve done no survey. I’m saying that the approach to research demonstrated by Fiske in her response to criticism of that work of hers is an style that, ten years ago, was standard in psychology but is not so much anymore. So again, her discomfort with the modern world is understandable.

      Fiske’s collaborators and former students also seem to show similar research styles, favoring flexible hypotheses, proof-by-statistical-significance, and an unserious attitude toward criticism.

      It’s kind of annoying that you criticize me for something (“question the validity of all her research”) that I specifically didn’t do.

      Fiske, Kanazawa, Bem, Bargh, etc., have been working within a research paradigm that has given them and their colleagues a lot of success over the years, but which for statistical reasons is seriously flawed, for reasons that Meehl explained fifty years ago and which Uri Simonsohn, Katherine Button, and various others have explored more recently. As I explained in my post, these changes have come pretty recently so I can see how Fiske and her colleagues can have felt that they’ve lost control of the narrative—but if we get back to the science, what we’ve learned is that (a) lots of celebrated results in social psychology have not replicated, and (b) this non-replication makes perfect sense once you start thinking about effect size and variation (in the preregistered replications) and researcher degrees of freedom (in the original studies).

      It’s not personal and I don’t see my post being a personal attack. But I guess that’s a difference of opinion. I do speculate on the reasons that people (including myself) have been slow to change attitudes, because I think that’s important. I’ve also published many articles (even in peer reviewed psychology!) laying out the technical issues here. Ranehill et al. published a peer-reviewed non replication of power pose. Etc. We can proceed on multiple channels.

      • MichaelP says:


        First, I would like to emphasize again I agree with your points regarding the necessary paradigm shift in science, and this post marks them out remarkably well and is important in this regard. I also appreciate the time you put to write this blog and trying to make a change in the way we do science.

        I also made general allegations that you questioned the validity of all of Fiske’s research, in which I am mistaken. However, the post does seem to make an example of Fiske specifically (“Who is Susan Fiske?”) and implies very strongly that Fiske is using flawed methodology throughout her career (“Fiske… owns stock in a falling enterprise”, “what Fiske should really do is… to admit she and her colleagues were making a lot of mistakes”, “Fiske… is working within a dead paradigm”, “her discomfort with the modern world is understandable”, “if Fiske etc. really hate statistics and research methods”, “they seem to believe just about anything”, “for Cuddy, Norton and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake”, “the gaps in her own understanding of research methods”.)

        Perhaps I’m mistaken, but the implicit/subconscious effect of reading the post seems to be for readers to become skeptical of all of Fiske’s work. Of course I may be wrong here, but I feel more comfortable with critiques pointing out major limitations in current research (as is done in the Ioannidis paper) without focusing on specific researchers. And again, I agree with your critiques of the “old” methodology, but it seems that Fiske’s editorial did not refer to scientific methodology but to the way scientific critiques should be presented and discussed.

        I think at least some of the points Fiske made are valid – there is a gap that blogs and twitter fill, of commenting anonymously or openly on methodological issues in published peer-reviewed papers. This is obviously important, but irrespective of this specific post, the problem is that unfiltered / uncurated content can be used to attack researchers and put their work in question with them being in a problematic position to respond (at least to all of the comments, if not to the many dispersed blog and twitter posts aimed at them). Bloggers and commentators are becoming an authority because of the commenting gap, and this may affect researchers’ careers in problematic ways. One solution may be that journals open a paper-specific forum for comments on the paper’s methodology and conclusions, with validation that the posts/comments regard the specific paper’s problems and not the author. I believe this is an important issue to discuss – how to develop open commenting while addressing the potential for “shaming”, in addition to the excellent discussion developing here on research methodologies in general, and in this regard Fiske’s editorial may be important and justified.

        • One potential problem with criticizing the “old” methodology without calling out particular examples is that everyone fully agrees, and assumes that they are not the ones…

          ie Fisk reads “the old paradigm of social science research is flawed and needs attending to immediately” and says “Gee, that’s right, all those old methods people used to use before I came along… I’m so glad I am not a part of that”

          so, I think ultimately specific examples are necessary. Perhaps focusing on the papers more than the authors would be helpful, but it’s standard practice to cite papers as “Fiske et al. 2005” etc so in academia papers are immediately identified with authors as standard practice.

        • Maz says:

          If social psychology (among other disciplines) is to have a real reformation, then there are going to be casualties. The reputations of people who have built their careers under shoddy standards of research will have to suffer. Otherwise the whole movement for better practices is a failure.

        • Costa Vakalopoulos says:


          A number of points raised ought to be addressed:

          1. it’s the filtered and curated culture that is entrenching narrow interests, is anathema to progress in science and is leaving distortions in the literature unchecked.
          2. Discomfiture at singling out is immaterial to the scientific endeavour indeed papers need to be singled out, otherwise I could be citing or buiding my own work on erroneous conclusions. Science shouldnt be about individual sensibilities
          3. Where there is smoke there is bound to be fire. A natural response to one poor paper is how endemic is the practice
          4. To expect journals to voluntarily allow criticism or retract the work they publish is rather naive.
          5. No academic, editorial or grant generating position should be considered sacrosanct.
          6. Opening up discussion can only benefit scientific altruism, which appears to have been marginalized in the wake of careerist networking

  57. Thomas says:

    I am surprised no one has mentioned the replication efforts in experimental economics.

    Seems as though they are doing a bit better than experimental psychologists…

  58. Tim says:

    On the issue of academic peer-review versus amateur blog-comments, in the domain of James Joyce studies (alas) the academics want nothing to do with the blogs. This is especially unfortunate because the Internet has simplified the enormous challenge of annotating Ulysses and Finnegans Wake into a vast playground of low-hanging fruit, that can only be attacked via community collaboration, but evidently without the least assistance from anyone preoccupied with tenure. Their journals have those familiar attitudes of fancy language, obscure content, paywalls and no live commenting.

    • Chris says:

      You say “preoccupied with tenure” as though it’s a moral failing, but that seems to me like a pretty good to be preoccupied about! Who should get the blame if academics in James Joyce studies are failing to contribute to crowdsourced projects that “don’t count” in the eyes of Promotion & Tenure committees? I would say principally those committees, which aren’t taking a suitably big-picture attitude toward how humanities scholars can make their work accessible to the public. If you really must blame the scholars themselves, I’d reserve my ire for the tenured scholars who could presumably contribute to such projects at least without jeopardizing their careers (although even there, it’s probably nice to get a raise once in a while). Plenty more could be said, with blame all around, but I resist the implication that it’s chiefly a matter of snobbery.

  59. Dears.

    The crisis will, God willing, spill into economics. I was woken up to the problem of Null Hypothesis Significance Testing (namely, the notion that numbers contain their own “significance”: never mind oomph or meaning or the scientific context) by reading Meehl and his small company, and later Cohen, in psychology. That was in the early 1980s. Since the 1990s Stephen Ziliak has joined me in producing one article and book after another trying without much success to get economists to ask How Big is Big, Substantively. Our book of 2010, The Cult of Statistical Significance we laid into economics, but Steve discovered that even Gosset (“Student” of Student’s t) tried to get R. A. Fisher to realize that loss functions matter. Gosset worked for Guinness, and felt economic losses on his pulse.

    In short, I predict, and hope, that the pretend science of loss-functionless “tests” in my own field comes tumbling down. I love economics, but I want it to decide on factual matters factually, not by mumbo jumbo.

    Deirdre Nansen McCloskey
    University of Illinois at Chicago

  60. In “Dialectic of Enlightenment” lefties Horkheimer and Adorno realize that the whole enlightenment project is an effort to understand nature in order to dominate it and other men.

    But my reading inspired me to this catchphrase: System is Domination.

    There is only one reason to construct a system, whether scientific or administrative or military or corporate or religious: Domination.

    When Germans like Wilhelm von Humboldt invented the research university during the Napoleonic wars, the idea was to strengthen the Prussian state to fight against and solve the 200-year-old French Problem. Ever since, universities have been creatures of the state and supporters of state power.

    The problem with dominatory systems is that they are very bad at dealing with mistakes. Yet the essence of human action is learning from your mistakes.

    So the best research model in the world would be a model that enabled the swift discovery of inevitable mistakes. If you wanted to design a model that would punish mistakes you would choose the model of the current administrative academic system where the discovery of a mistake is a scandal to be covered up rather than a one-day wonder to be forgotten in a month.

    Paul Samuelson once sneered that the stock market had predicted nine of the past five recessions. Of course it did. Humans get to knowledge by powering through mistakes.

    I understand Susan T. Fiske’s sorrow at the collapse of the old order. But the new world of blogs and social media will root out inevitable errors much faster than under the top-down system of the old regime. And that will be good for science — and for life, the universe, and everything.

    • Neale Martin says:

      You make an interesting and important point about the creation of systems, but I don’t think the intent is always domination but preservation. Once the system is in place, the future of those in the system requires them to protect the system. Whether that’s the Catholic church covering up child molestation by priests, or government bureaucrats destroying hard drives, the impulse is to protect the system that provides members’ incomes and place in society. The scientific method is not tenure, nor is it getting five publications in peer reviewed journals. Academia is destroying the scientific method, and this increasingly includes the hard sciences where cosmologists and theoretical physicists are questioning Popper’s requirement that a theory be falsifiable. Thus the need to preserve the system typically devolves in the need to dominate.

  61. dan pinchuk says:

    Is there a good popular book that addresses these issues? It seems the issue is what feynman called cargo cult science…people that are just gussying up question begging with statistical lipstick on a pig…

  62. KM says:

    The crisis has been in economics (econometrics) for a long time. It has simply been buried.

    The long list of articles published on the fairly jaw-dropping rates of (internal) replication failure in economics and econometrics is a major lacuna in Andrew’s list.

    One early article that really crystallised the crisis and drew attention to it was Dewald, Thursby and Anderson’s “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project”, published in no less visible a spot than the American Economic Review in 1986.

    For just a few of the other examples, see Hamermesh, “Replication in Economics” (2007), Anderson et al., “The Role of Data/Code Archives in the Future of Economic Research” (2006, 2008), Burman, Reed and Alm, “A Call for Replication Studies” (2010), Hubbard and Armstrong, “Replications and Extensions in Marketing: Rarely Published but Quite Contrary” (1994), Hubbard and Vetter, “An Empirical Comparison of Published Replication Research in Accounting, Economics, Finance, Management and Marketing” (1996), and a series of studies conducted by the invaluable B.D. McCullough and his collaborators. If you want economics-style “explanations”/models of the replication troubles framed in terms of researchers’ incentives, etc., there is Feigenbaum and Levy, “An Economic Model of the Econometrics Industry” (1989) and “The Market for (Ir)Reproducible Economics” (1993), Mirowski and Sklivas, “Why Econometricians Don’t Replicate (Although They Do Reproduce)” (1991), and so forth.

    The hue and cry prompted by the Dewald/Thursby/Anderson article, and later studies showing little had changed, gave rise to a rash of “replication policies” being adopted by prominent economics journals, to much fanfare. Such policies have since been shown to be abject failures, largely because of their weakness (compliance with many being purely voluntary) or because of deliberate non-enforcement by editors.

    It is worth quoting McCullough (2009), who has considerable experience in this vein: “There are many economists who do not want their work checked. To verify this, simply start contacting authors of applied economics articles at random, tell them that you want to check to make sure that their published results are reproducible, ask them for their data and code, and see how many are forthcoming…. There is a good reason that many economists do not want their work checked: all available evidence indicates that replicable economic research is the exception and not the rule.”

  63. AndyC says:

    I work in the tech industry, where a lot of focus is on the “digital transformation.” It looks to me that Susan Fiske, Jeff Drazen at NEJM, and old guard physical anthropologists like Tim White, who won’t release fossils or all of the critical source data for independent inspection, are fighting against the transformation.

    The digital transformation makes it far simpler for “outsiders” to critically examine your work and point out the mistakes. So, reactions range from, “you can’t see my data” (NEJM) to “comments must be subject to editorial oversight and peer review” (Fiske).

    Yet, Fiske makes some good points. All of us have encountered internet trolls who delight in creating controversy for controversy’s sake. Some here have commented that forward-thinking journals will promote, even advertise their data-sharing policies and commenting forums. A good forum can be moderated without stifling scientific debate.

    The truth is, Fiske should be happy that someone cares enough about her work that they spend the time to see if her claims are warranted. Loads of academic papers are never read by more than an audience of 20.

  64. Terry says:

    You left out perhaps the most prolific and persistent investigator of statistical and data sleight of hand: Steve Mcintyre at Climate Audit.

    He has been at if for almost two decades now, tirelessly auditing academic climate-science publications and finding many of the same weaknesses highlighted on this blog.

  65. Toshiaki says:

    I was expecting to find “On the emptiness of failed replications”[0] in the timeline and was disappointed that it was missing. I thought it was one of the things that started to bring wider attention to these issues.


  66. herr doktor bimler says:

    The recent critique by Newell & Shanks is a worthwhile addition to the timeline:
    Unconscious influences on decision making: A critical review

  67. david black says:

    Thanks for that really made me think about how other professional groups near to science (eg health promotion) deal with this issue. I will look forward to other Randy Newman themed articles, how about “Davy the fat boy” or “So long dad” .

  68. Kevin O'Neill says:

    Andrew Gellman writes”She’s [Fiske] implicitly following what I’ve sometimes called the research incumbency rule: that, once an article is published in some approved venue, it should be taken as truth.”

    I don’t believe this is true and is in fact contradicted by the quoted article. Dr Fiske writes “Trust but verify.” She also writes of the need for an adversarial process and transparency in research, including making data freely available. All of these point towards the opposite conclusion, that Dr Fiske believes all research should be questioned.

    One has to wonder if acknowledgement of this error would change the author’s conclusions …. not likely.

    • Andrew says:


      Fiske says “Trust but verify,” but I don’t agree with her on the “Trust” part. Unlike Fiske, I don’t think we should trust the claims of the himmicanes study, the air rage study, etc., and then try to “verify” them. Rather, I think these are claims that are not supported by data. I feel the same way about the ovulation-and-voting study, the beauty-and-sex-ratio study, and various others discussed here. One can use statistical reasoning to understand how such studies do not present strong evidence in favor of their claims, even when there are “p less than .05” results.

      There is also the evidence of Fiske’s own actions in regard to the criticisms of published work by herself and her colleagues. Their theories are so open-ended they can explain any result, hence even when big problems are found in the statistics, there is no recognition that something might be wrong.

      These problems are not unique to Fiske, not at all. As I wrote in my above post, as recently as 5 or 10 years ago, almost all of us were routinely trusting the results of published studies. That was the whole point of my post, that things have changed and it can be hard for people to adjust.

    • Martha (Smith) says:


      I agree with Andrew, but would put it slightly differently:

      With some helpful nudging from Daniel Lakeland, I gave upthread a characterization of science as “where theories are submitted to validation or refutation by careful observation, experiment, and reasoning”

      Fiske’s “Trust and verify” omits the refutation part — and her “trust” in fact conflicts with the possibility of refutation.

  69. balazs aczel says:

    The crises of our time, it becomes increasingly clear, are the necessary impetus for the revolution now under way. And once we understand nature’s transformative powers, we see that it is our powerful ally, not a force to be feared or subdued. – Thomas Kuhn

  70. ezra abrams says:

    Andrew: what is the base rate for “what % of work in a given field is bad”
    Didn’t Salk say something to the effect that you can’t reform a bad institution, you have to start with something new and good ?

  71. Marc B says:

    I may have missed this in more recent responses, so I apologize if this is repetitious.
    Relevant to the general issue of p values in science, the American Statistical Association recently released a statement about p values, mainly to counter the widespread misunderstanding of what a p-value actually is. It can be found at

  72. ezra abrams says:

    I think the problem is that we have unrealistic expectations of scientists (as our new religion)
    we expect them to be honest, etc
    however, we all know that most of the people who get to be a prof at Princeton are very bright, very hardworking, and, in my experience, extremely competitive – with an emphasis on the ‘extremely”

    I was just reading in PNAS )(not PPNAS) evolutionary theory on why, iirc, braggarts will win out

    The question isn’t, why is so much science so bad, the question should be, given that science is a highly competitive game with good rewards (tenure at princeton vs flipping burgers), why isn’t it worse

  73. ezra abrams says:

    one more thought
    I”m 60 years old, and my sense of post WWII history is that in the 50s and 60s there was an explosion of funding; people didn’t have to be competitive with each other – you could afford to be honest (pace the lambda letter)
    as funding began to dry up in the 70s, people got nastier; there was a smaller pot for more people
    I think keeping my job and grant in an era of declining funding/person, is behind a lot of this

  74. I suspect the winds are just picking up. The solution to science’s replication crisis — outlined at — looks very different from science as it is currently practiced.

  75. Kevin O'Neill says:

    Perhaps it’s my age, but the phrase ‘trust but verify’ has a particular meaning. As wiki says, “Trust, but verify is a form of advice given which recommends that while a source of information might be considered reliable, one should perform additional research to verify that such information is accurate, or trustworthy.” In the USA Ronald Reagan put it into common parlance in dealing with the Soviet Union. The emphasis is on ‘verify’ — not ‘trust.’

    I mentioned two other lines of thought that Fiske put forward, the contributions of the adversarial process and the need for transparency in methods and data. One has to ignore what she explicitly wrote to arrive at a AG’s conclusion that “She’s implicitly following what I’ve sometimes called the research incumbency rule…”

    • Andrew says:


      1. Unfortunately I think Fiske has not applied a suitably Reaganesque skepticism to iffy research claims. She okayed that himmicanes paper and some other really bad ones for PPNAS, and when flaws were found in her own work, she was in my opinion all too quick to conclude that none of her conclusions should change.

      2. I think it’s unfortunate that Fiske views research criticism as an adversarial process. We’re all on the same side here. When I tell a researcher that I think his or her study is hopeless, I say this not as an adversary but as a fellow scientist: I happen to have some statistical understanding (for example, this, expressed more formally here) that is relevant to the problem at hand and which I’d like to share. Sure, sometimes I get frustrated or annoyed—don’t we all?—but ultimately I see us as having a common goal, it’s not ultimately adversarial at all.

  76. Plucky says:

    The key problem with Fiske is this sentence: “Ultimately, Science is a community, and we are all in it together.”

    That is just flat-out wrong, and wrong in the ways that result in all that you have criticized. Science is not a community, it is a method.

    I don’t know your field well, but I do know politics well. An overriding problem reading Fiske’s letter is that, unless one was told the context and background, it could easily be mistaken for the sort of weak, political defenses of interest groups of the op-eds-written-by-lobbyists variety. The verbiage, tone, responsible-adult-versus-unruly-child characterization of a dispute, unnamed strawmen bad-guys, gratuitous comparison to terrorism, and the false claim of lacking a personal stake in the fight are all standard issue rhetorical crutches of political hacks, and it ought to be embarrassing for a scientific field that someone who is apparently quite eminent in it would be talking this way. That’s really not a good look for Science.

    Obviously, Science is something practiced by people in collaboration, around which a community will naturally form, but the critical element to the authority and legitimacy of the enterprise is and has to be that all institutional arrangements be designed to counter-act the natural human tendencies of clique-ishness, careerism, etc and be continuously scrutinized for their efficacy in doing so. The entire point of granting tenure, for example, is precisely to shield the practice of science from the machinations of politics, and to allow people to do things like admit that they’ve wasted 10+ years chasing a phantom without losing their employment. Some concessions to human nature have to be made, but Science and its practitioners must always take the position that truth is more important than any scientist’s reputation or career.

    The fundamental problem with defining Science as “ultimately… a community” is that communities do not generally value the truth over their members’ well-being (in either the material or emotional sense). The functional purpose of a community is precisely to provide support, protection, and validation for its members. Communities whose stated purpose is altruistic need very strong commitments both individually and institutionally to avoid corruption, and even with those commitments the struggle to avoid corruption is eternal and never fully successful. Given the description of Dr. Fiske’s research interests she ought to know this better than just about anyone.

    My main criticism of this post is the stages of the metaphor- you’re nowhere near six feet of water in Evangeline. If Science devolves into merely a community, then it’s just another political interest group which will be treated as such. If you think the replication crisis is a big problem now, you have no idea what you’re in for when actual politicians start getting involved, which sooner or later they will. Academic research rests on a substantial amount of direct and indirect taxpayer largesse, and what doesn’t rest on outright subsidies rests on being attached to institutions (Universities) whose perceived efficacy as providers of undergraduate education is eroding. Pretty much every level of government is headed for some very nasty and very painful financial fights in the upcoming years, and academia is not well prepared to argue why it should be spared the pain. There’s a reason universities now have more lobbyists than anyone except pharma and electronics (, and that reason is that you hire armies of lobbyists when you know your public support is shaky.

    • Tom G says:

      +1 … +10?
      “The fundamental problem with defining Science as “ultimately… a community” is that communities do not generally value the truth over their members’ well-being (in either the material or emotional sense). The functional purpose of a community is precisely to provide support, protection, and validation for its members. “
      Truth vs communal solidarity.
      For most people in most aspects of their lives, community ties are more important than "Truth".

      This is not quite the Agent problem in economics, but has similar implications.

    • Superb comment. I have been bothered by this “science is a community” business for a while (not just in Fiske’s words, but in many contexts). Thank you for explaining so thoroughly what is wrong with it.

      Similarly, people often speak of “the consensus of the scientific community” as though science were a consensus-building activity. They quote Kuhn out of context: “What better criterion could there be … than the decision of the scientific group?” (I don’t excuse Kuhn for putting it in that way–the wording is problematic at best–but others have taken it and hotfooted it.)

    • Lee Jussim says:

      Hey Plucky,

      Your first point (method, not community) is killer. The political point strikes me as secondary, but certainly something to be concerned about for those of us in academics…

      I hereby officially invite you to do a minor revision to this and post as a stand alone guest blog on my Psych Today website.

      And if you prefer to stay out of the limelight, I am hereby requesting that you allow me to post the comment as a stand alone.

      You can contact me by replying here, or directly at jussim at rci dot rutgers dot edu. If you send to Rutgers, I might not reply right away because it might get caught in my spam filter…


  77. Marvin C says:

    As a Physicist who specializes in quantum mechanics, statistics are the life blood of experimental design. I designed electromagenitc apparatus using these designs and had some breakthroughs in a world established by electrical engineers. I found the breakthroughs came where I understood the statistics before I built the physical apparatus. On particular design had been kicking around in the IEEEE Microwave Journal in various forms for forty years. When I was assigned to make a working test instrument I followed the derivations and found a whole host if wrong assumptions because of bad modeling. The interactions not being understood led to overly complex models that could not be tested adequately (give the researcher too many degrees of freedom and the problem gets lost in an untestable mess). I recognized the problem as a one of the first problems one is given in quantum mechanics, changed the variables to electromagnetics, and built a working test apparatus within 6 months.
    If you do not understand how to the statistics actually work, and why degrees of freedom are used, and not used in model building, you will be able to write mountains of paper and never do anything useful.

  78. Boris Barbour says:

    Keep up the good work, Andrew.

    The “establishment” in several fields has already gone through the painful process of losing control. A couple of examples we at PubPeer have responded to: (PubPeer response to ACSNano editors) (Vigilant scientists)

    The bottom line should be this: if you don’t feel you can defend your work against criticisms from other scientists, don’t publish it.

    • Martha (Smith) says:

      +!. Thanks for the links. I especially recommend the second one, which makes the case that, especially in medical research, withholding criticism for fear of offending those criticized can have serious consequences for the public.

    • Keith O'Rourke says:

      Definitely liked this “suggestion to “draw the author aside” for a quiet chat “after a seminar” is certainly a good way to minimize hurt feelings, but it is totally ineffective as a strategy for disseminating information. We believe that making any relevant information rapidly available to readers should trump concerns about the authors’ feelings,”

      By _thoughtfully_ protecting authors’ feelings, one dismisses the interests other researchers and those who will be affected by the research or even damages those interests.

      • Andrew says:


        Yes, we had such a situation on this blog recently, where I criticized the work of some people I know personally. My criticism was much more mild than it would’ve been had there been no personal connection.

        • Rahul says:

          I think this is a HUGE problem in general: We have made peer review a pillar of academia. Most peer review happens within a -very tiny cohort of people.

          Almost anyone on review committees, referees, etc. falls in the class of “people I know personally”.

          Ergo, we very rarely get true opinions or frank criticism. Mutual back scratching is what has become the norm.

  79. skk says:

    thanks much for the link to paul meehl. So the scepticism about scientific papers goes back a long way then, not just the Ionnadis paper – all the way back to 1989 ( and more ). I’m really enjoying his 12 lectures.

  80. Dear Dr. Gelman:
    I would like to share a website with you where I’ve been accumulating my experiences over the past two years. I had politely inquired about a study from the authors, and they refused to respond. They then called the University Police to threaten me with criminal harassment charges if I asked them again (and I am emeritus in the same university department they are!). And secret communications with the journal thwarted attempts to address the problem there. The University administration saw no evil, no scientific or ethical misconduct, not deans, not vice chancellors, not the provost. I read your post and every word described the situation that I’ve been battling for two years, right down to two flashy (but untrue) papers delivered at a scientific conference,two students of the same professor winning the only two best student paper awards, my attempt with the societies to get those awards retracted because of the injustice to the real graduate student scientists at the meeting, and the injustice to science in general. I finally have a major review of this entire subfield of animal behavior that will soon appear, but there are a lot of dead bodies along the way. A wonderfully attractive idea, verified repeatedly by the perceived leaders in this field of study (3 presidents of the society, all major award winners), swallowed hook line and sinker by everyone, the citations accumulating exponentially, and so on, but not a word of it true.

    I would like to share this website with you. It is password protected, at least for now. But I cannot (yet) risk that this email to you is posted live. For now, this case needs to be kept quiet, at least until the original authors of whom I inquired have a chance to reply. It seems only fair. So far, they’re having difficulty finding what they can say and may delay indefinitely.

    If I can receive an email from you assuring me that this reply and website will not be shared publicly, I’d like to send it to you.
    Here’s the website:
    I’ll send the password if you request it.
    Thank you for your efforts to improve the quality of science. It is hugely important.
    Kind regards . . . Don Kroodsma

  81. Oleh says:

    100% reproducible significance
    Using gretl, it’s free, so everyone really
    can reproduce

    const 0,132476 0,0966686 1,370 0,1739
    x37 0,502034 0,102374 4,904 3,98e-06 ***
    x51 −0,359254 0,105221 −3,414 0,0009 ***
    x58 0,446923 0,0978524 4,567 1,51e-05 ***
    x62 0,323837 0,103650 3,124 0,0024 ***
    x73 0,350484 0,0929507 3,771 0,0003 ***
    x75 −0,264395 0,0910856 −2,903 0,0046 ***

    R squared 0,396214
    F(6, 93) 10,17133 p-value 1,25e-08

    Recall, all yhe variables vere generated as independent
    This is what people had to do to survive in
    a publish-or-die world
    And this is a case without biases!

  82. Oleh says:

    nulldata 100
    set seed 14

    loop i=1..90 -q


    list xlist = 0 x*
    ols y xlist
    omit –auto=0.01

  83. Neale Martin says:

    I came back to academia in my 30s to get a PhD and was surprised that the entire process was so incredibly biased to get positive results. You get one chance in your dissertation, and your academic career is often launched by the publications you can generate off of that work, yet you are the one designing the experiment, conducting the research, collecting the data, and then interpreting the results. And not only is the PhD candidate heavily invested in getting positive results, so is his/her advisor. This is then multiplied when a few academics become media celebrities and are able to make hundreds of thousands of dollars off of books and speaking engagements. And of course the journals only publish positive results and don’t publish replications. This is anti-science. It creates an inordinate push in one direction. And it undermines the value of negative results. How many researchers have conducted the exact same research as someone who’s negative results never saw the light of day? This also sets up an unhealthy dynamic where careers are built upon one research stream that might have begun on a questionable foundation.
    It seemed to me at the time (early 1990s) that it would make more sense to have the results of research analyzed by dedicated statisticians who would be both dispassionate about the results and also able to help the researcher understand the potential and limitations of their research. As I’ve observed so much obvious hokum get published and make it into the popular culture over the last three decades, I’m more convinced that there is a need to fundamentally rethink the methodology and assumptions that have led to what can reasonably be described as a crisis of confidence across numerous disciplines.

    • Shravan says:

      I keep thinking that the number one scientific problem to be solved today is how to give a stats education to non-statisticians who want to use statistics. Everything else is just a symptom of this disorder.

      One thing I know is that psych* type people are like alcoholics who don’t know they have a problem. We need a Scientists Anonymous for people to come out and start their ten step recovery process.

  84. Carol says:

    Hi Neale,

    I think it would be better to consult the statistician at the time the experiment or study is being designed, not after the data are collected.

  85. John Jackson says:

    @BenK: “As for areas with no experimental component at all – […] – I guess they have completely different problems.”

    Yeah. Palaeontology for example: “Completely different” as in completely ruined, as opposed to somewhat compromised.

  86. bul says:

    hey, you have a spelling error: “…point is, these events from 2005 and 1007 fit into”. Just letting you know

  87. Thaniell says:

    I’m totally with you that there is a general problem, but I also think you overstate Fiske from what you quote, I would not read from her original text that published work is automatically true. But I think the underlying problem is that we see a rise – in general society – of public outrcies over single cases perceived as misbehaviour that also reaches into the scientific area and “distracts” from the actual science (think of Matt Taylor’s shirt for example). The public often mixes opinion with personal criticism of people that provide scientific support to a contrary opinion – not sure if it’s better or worse if their science is not well-founded. And sure, this is not helping science – and yes, if for every unsound accusation for opinionated reasons an inquiry is launched it is an utter waste of time.
    On the other hand, the big problem in science, even beyond what you specifically address in the psychology field is the inherent feeling in most professional scientist fields that you need to publish, publish, publish and even publish something more. As a PhD student you best have published several papers at world leading conferences before you get your PhD. So paper after paper is spit out, well intended, but often done with the bare minimum – and once that topic is dead you couldn’t care less about inquiries about it, because you are busy with the next paper. I’m overstating a bit, but my point is that we provide the wrong incentive and environment for scientists in academic careers. I think it’s funny and sad that the aim to make things more objective (base your decision who to hire in academia on publications in peer-reviewed conferences) makes matters worse. So, basically my point is, you both have a point^^ And the first is only made worse by the second point – the more science is considered a career and pressure is shoving people in that machinery into the wrong direction the lower the trust into scientific publications the lower the trust of the public into “scientific facts”.

  88. Albert Lash says:

    I enjoyed this page. It reminds me of what has happened in nutrition science over the past 50+ years. Bad science continually covered up or left unnoticed by unethical, amoral, ignorant, or cowardly individuals protecting or singularly focused on their careers. What a disappointment.

  89. Mark says:

    Thank you so much for addressing this so completely and so eloquently. Faulty evidence is fueling the revolution of cynicism. People turn away from science and back towards ideology. A reasonable response in the face of conflicting ‘facts’.

    If I can donate to your cause somehow, please let me know how by email.

  90. Titanium Dragon says:

    I went to Vanderbilt University.

    In 2006/2007, I did some Masters level classes. We read a lot of scientific papers, and one of our hobbies at the time was tearing them apart. Our professors showed a lot of skepticism and encouraged this practice. I had assumed it was widespread even at the time.

    That was in biological sciences/bioengineering stuff, but we all didn’t really believe in a lot of papers.

    I’m kind of surprised that people elsewhere did, frankly. They always seemed very methodologically shaky to me with a lot of “I wanted to publish something!”

    • Keith O'Rourke says:


      I did that with rehab medicine undergraduates in 1986 at Faculty of Medicine, University of Toronto using study quality scoring guidelines. The students really enjoyed it (they evaluated their prof’s research papers) and the highest score out of 10 was a 3.

      So I have been surprised over the years how neglected the quality issues in research production and reporting has been. For instance in 1996 at a meeting of statistical researchers the reaction to my talk was “you are painting an overly bleak picture of clinical research”.

      Some of this might have to do with “I can do better than that wishful thinking” of those entering the field of research and not revising that when they experience the constraints, cut corners and publish questionable stuff for career survival.

      Statistical methods and statisticians often provided an enticing _advertisement_ that if you use these methods, take this one term course or work with me – your research will be good (seldom if ever discussing previous work, selection and recasting that happened before the first meeting, nor auditing the data they were given or comparing what other publications found).

      Also forking paths is subtle and likely was underappreciated by almost everyone.

      Not me – I did better that that! ;-)

  91. concerned says:

    I heard the results for the trolley problem are bunk. Someone requested the data and it turns out the fat man is an outlier; in reality he was normal size.

  92. Phil Goetz says:

    I would suspect the replication problem has become a big issue in psychology because psychology is open enough to acknowledge the problem. My experience is that most journals and magazines don’t print letters to the editor about existing articles unless they arrive within a few weeks of the articles’ publication, and decide which letters to print based on the reputation or institution of the letter-writer, not on the validity of its critique. If a bad article isn’t refuted immediately on coming out, by someone from a famous institution, no letter will ever be printed.

    There are some famous examples of this, like the 2007 article “Aspartame: a safety evaluation based on current use levels, regulations, and toxicological and epidemiological studies,” which was the main reference used in an FDA review of Aspartame, and was later revealed to have been funded by Monsanto (who owned NutraSweet at that time) and written by consultants with long-standing relationships with companies with a financial interest in Aspartame and/or formaldehyde. AFAIK, no notice was ever published in /Critical Reviews in Toxicology/ that Monsanto had funded the study, nor of the many conflicts of interest of the authors, despite widespread criticism in the field.

  93. iqvoice says:

    In the interests of correcting errors in your post, please search your text for the following.

    ” har ” (presumably meant “her”)

    “1007” (presumably meant “2007”)

Leave a Reply