Gremlins in the work of Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske

Remember that “gremlins” paper by environmental economist Richard Tol? The one that had almost as many errors as data points? The one where, each time a correction was issued, more problems would spring up? (I’d say “hydra-like” but I’d rather not mix my mythical-beast metaphors.)

Well, we’ve got another one. This time, nothing to do with the environment or economics; rather, it’s from some familiar names in social psychology.

Nick Brown tells the story:

For an assortment of reasons, I [Brown] found myself reading this article one day: This Old Stereotype: The Pervasiveness and Persistence of the Elderly Stereotype by Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske (Journal of Social Issues, 2005). . . .

This paper was just riddled through with errors. First off, its main claims were supported by t statistics of 5.03 and 11.14 . . . ummmmm, upon recalculation the values were actually 1.8 and 3.3. So one of the claim wasn’t even “statistically significant” (thus, under the rules, was unpublishable).

But that wasn’t the worst of it. It turns out that some of the numbers reported in that paper just couldn’t have been correct. It’s possible that the authors were doing some calculations wrong, for example by incorrectly rounding intermediate quantities. Rounding error doesn’t sound like such a big deal, but it can supply a useful set of “degrees of freedom” to allow researchers to get the results they want, out of data that aren’t readily cooperating.

Here’s how Brown puts it:

To summarise, either:
/a/ Both of the t statistics, both of the p values, and one of the dfs in the sentence about paired comparisons is wrong;
or
/b/ “only” the t statistics and p values in that sentence are wrong, and the means on which they are based are wrong.

And yet, the sentence about paired comparisons is pretty much the only evidence for the authors’ purported effect. Try removing that sentence from the Results section and see if you’re impressed by their findings, especially if you know that the means that went into the first ANOVA are possibly wrong too.

OK, everybody makes mistakes. These people are psychologists, not statisticians, so maybe we shouldn’t fault them for making some errors in calculation, working as they were in a pre-Markdown era.

The way that this falls into “gremlins” territory is how the mistakes fit together: The claims in this paper are part of an open-ended theory that can explain just about any result, any interaction in any direction. Publication’s all about finding something statistically significant and wrapping it in a story. So if it’s not one thing that’s significant, it’s something else.

And that’s why the authors’ claim that fixing the errors “does not change the conclusion of the paper” is both ridiculous and all too true. It’s ridiculous because one of the key claims is entirely based on a statistically significant p-value that is no longer there. But the claim is true because the real “conclusion of the paper” doesn’t depend on any of its details—all that matters is that there’s something, somewhere, that has p less than .05, because that’s enough to make publishable, promotable claims about “the pervasiveness and persistence of the elderly stereotype” or whatever else they want to publish that day.

As with Richard Tol’s notorious paper, the gremlins feed upon themselves, as each revelation of error reveals the rot beneath the original analysis, and when the authors protest that none of the errors really matter, it makes you realize that, in these projects, the data hardly matter at all.

We’ve encountered all three of these authors before.

Amy Cuddy is a co-author and principal promoter of the so-called power pose, and she notoriously reacted to an unsuccessful outside replication of that study by going into deep denial. The power pose papers were based on “p less than .05” comparisons constructed from analyses with many forking paths, including various miscalculations which brought some p-values below that magic cutoff.

Michael Norton is a coauthor of that horrible air-rage paper that got so much press a few months ago, and even appeared on NPR. It was in a discussion thread on that air-rage paper that the problems of the Cuddy, Norton, and Fiske paper came out. Norton also is on record recommending that you buy bullfight tickets for that “dream vacation in Spain.” (When I mocked Norton and his coauthor for sending people to bullfights, a commenter mocked me right back by recommending “a ticket to a factory farm slaughterhouse” instead. I had to admit that this would be an even worse vacation destination!)

And, as an extra bonus, when I just googled Michael Norton, I came across this radio show in which Norton plugs “tech giant Peter Diamandis,” who’s famous in these parts for promulgating one of the worst graphs we’ve ever seen. These people are all connected. I keep expecting to come across Ed Wegman or Marc Hauser.

Finally, Susan Fiske seems to have been doing her very best to wreck the reputation of the prestigious Proceedings of the National Academy of Sciences (PPNAS) by publishing papers on himmicanes, power pose, and “People search for meaning when they approach a new decade in chronological age.” In googling Fiske, I was amused to come across this press release entitled, “Scientists Seen as Competent But Not Trusted by Americans.”

A whole fleet of gremlins

This is really bad. We have interlocking research teams making fundamental statistical errors over and over again, publishing bad work in well-respected journals, promoting bad work in the news media. Really the best thing you can say about this work is maybe it’s harmless because no relevant policymaker will take the claims about himmicanes seriously, no airline executive or transportation regulator would be foolish enough to believe the claims from those air rage regressions, and, hey, even if power pose doesn’t work, it’s not hurting anybody, right? On the other hand, those of us who really do care about social psychology are concerned about the resources and attention that are devoted to this sort of cargo-cult science. And, as a statistician, I feel disgust at a purely aesthetic level to these fundamental errors of inference. Wrapping it all up is the attitudes of certainty and defensiveness exhibited by the authors and editors of these papers, never wanting to admit that they could be wrong and continuing to promote and promote and promote their mistakes.

A whole fleet of gremlins, indeed. In some ways, Richard Tol is more impressive in that he can do it all on its own, and these psychology researchers work in teams. But the end result is the same. Error piled upon error piled upon error piled on refusal to admit that their conclusions could be completely mistaken.

P.S. Look. I’m not saying these are bad people. I’m guessing that from their point of view, they’re doing science, they have good theories, their data support their theories, and “p less than .05” is just a silly rule they have to follow, a bit of paperwork that needs to be stamped on their findings to get them published. Sure, maybe they cut corners here or there, or make some mistakes, but those are all technicalities—at least, that’s how I’m guessing they’re thinking. For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction. So that’s the real point of this post: the incentives. Forget about these three particular professionals, and consider the larger problem, which is that errors get published and promoted and hyped and Gladwell’d and Freakonomics’d and NPR’d, whereas when Nick Brown and his colleagues do the grubby work of checking the details, you barely hear about it. That bugs me, hence this post.

P.P.S. Putting this in perspective, this is about the mildest bit of scientific misconduct out there. No suppression of data on side effects from dangerous drugs, no million-dollar payoffs, no $228,364.83 in missing funds, no dangerous policy implications, no mistreatment of cancer patients, no monkeys harmed by any of these experiments. It’s just bad statistics and bad science, simple as that. Really the worst thing about it is the way in which respected institutions such as the Association for Psychological Science, National Academy of Sciences, and National Public Radio have been sucked into this mess.

118 thoughts on “Gremlins in the work of Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske

  1. “Putting this in perspective”

    These are standard errors in the field.

    These people get promoted to tenure at the top universities.

    These people are the ones that NEVER have any critique of ANY of my statistical methods in any of my talks, papers, dissertation, etc.

    These people are the ones that only critique that my STORY isn’t exciting enough. That my story isn’t clever enough. That I haven’t went far enough in extending and interpreting my results.

    These people are why the entire business is gross.

    • must do better: For what it is worth, I believe that Amy Cuddy recently was denied tenure at Harvard Business School and that she will be spending the next academic year at Harvard’s T. H. Chan School of Public Health.

      • She’s an associate professor. Unless Harvard is different in their titles, that means she has tenure. Perhaps she was denied full professorship.

        • So I see. I don’t know, then. The information about being denied tenure was passed on to me by a colleague. It could be wrong. She definitely will be at the School of Public Health next year; she herself tweeted that information.

          Some colleges and universities have the rank “associate professor without tenure” but I do not know if this is true of Harvard.

        • From a FINANCIAL POST article on April 7, 2016: “Although her book hit No. 3 on the New York Times best-seller list, and her “presence” exercises are being adopted by companies, sales departments, and sports teams, Cuddy says she doesn’t want her work to become a business. Having just stepped off the tenure track at Harvard Business School, her interest now is in starting a research centre….”

          “Just stepped off the tenure track” may be a euphemism for “didn’t get tenure.”

        • From THE ODYSSEY ONLINE on March 21, 2016:

          “She [Cuddy] mentions in the last few minutes, while being interviewed by Will Cuddy (same last name but no relation) about her recent opportunity to accept a position with tenure at Harvard Business School. She ended up turning down the offer due to her desire and passion to reach broader audiences.”

        • Harvard and Yale have always been different in this. Letting some people be promoted to associate was the way they would keep really good junior faculty for 3 extra years. Was depressing as anything to watch as a graduate student though I guess it promotes a useful kind of fatalism about academic careers.

        • I just looked at Cuddy’s CV:

          Princeton, PhD 2005

          Rutgers, Assistant Professor, 2005-2006

          Northwestern, Assistant Professor, 2006-2008

          Harvard Business School, Assistant Professor, 2008-2012

          Harvard Business School, Associate Professor, 2012-present

          Yes, I think that she must have tenure and that my colleague was wrong.

        • Carol:

          I have no idea how they do things at the business school, but at other places I know at Harvard, the associate professors are untenured.

        • Thanks to everyone for the responses. Apparently Cuddy has been “up for tenure” as they say but I have not been able to determine whether she was awarded it. Here is a quotation from “axon” on April 8, 2015, on arstechnica.com/civis/viewtopic.php?f=2&t=1278095:

          “I stupidly asked Amy Cuddy to be my dissertation advisor when I was at Harvard, and for the ENTIRE year, she couldn’t be bothered to meet with me once or reply to a single email…. She apparently just wanted to write on her CV that she had an advisee when she was up for tenure….”

    • Agreed.
      The lesson from the Cuddy and Duckworth’s of this world is that if you want to get famous or influential you need to follow a simple recipe: 1)make incredibly strong and ideally surprising claims based on completely unsupportive data combined with gross statistical errors, 2) publicize those claims through TED talks, popular books, and via your Ivy League association, 3) never admit an error when people catch you out, 4) move onto the next big “thing” and repeat the entire process.

    • Absolutely agree.

      It’s hard enough for early-career researchers to make stable careers out of research, but when people can make their way to positions of power on questionable research practices, this will only encourage further corruption in search of “good stories”.

      We need more investigation of fraud, misrepresentation, and misconduct to support psychology research.

  2. The fact that the psychological “theory” is so malleable that you can concoct a story to support literally anything seems like a big part of the reason this keeps happening. Until the field can build a more consistent and universal theoretical basis that creates testable, unambiguous hypotheses, it’s going to be easy to do this. Yes, this is (bad) empirical work, but part of the reason it’s bad is not because the authors are bad at stats but because there are no constraints on what they can push as a story. Finding a weird counter-theoretical result in something like economics is not a sign that you are on the cutting edge of a hitherto undiscovered fact, but that you probably messed up and will need a huge amount of supporting evidence to justify your results. I’m not sure what a single, unified theory of psychology would look like and I don’t think that’s necessarily a good end goal, but some progress in that direction is crucial for the field to gain credibility.

    • Ajg:

      Good point. Consider, for example, Richard Tol, the original gremlin. Tol’s an economist, and he did manage to get his papers published in respected econ journals—but his conclusions contradicted most existing theory, his work was controversial, and people pointed out lots of serious errors in his work. In contrast, Amy Cuddy is a culture hero, Michael Norton and Susan Fiske are bigshots in their field, and until very recently none of their work was seriously questions.

      So it does seem that you’re right: in economics, if you present a wacky idea, you’re required to defend it, whereas in social psychology, you can just present any wacky idea as correct, and just about nobody will care.

      Things may be changing, though. We haven’t yet reached Ted and NPR, but the leading science journalists have twigged to the problem. We’ve been having a bit of a “moneyball” revolution in social psychology in the past few years, thanks in large part to people such as Uri Simonsohn and Nick Brown who do the hard work of looking at the data in these published studies.

      • See also Peabody’s Outlier Gang Couldn’t Shoot Straight In Minnesota…, where they brought Tol in as well, to affirm the idea that the Social Cost of Carbon should negative (i.e., CO2 should be subsidized).

        This chart shows where that fits among Federal SCCs, other economists (like William Nordhaus) and shadow carbon prices for oil companies. Of course, he just ran his model with the the desired parameters, didn’t claim that actually made sense, but did somewhat match the gremlin world.

        Judge didn’t buy claims of Peabody, whose bankruptcy was filed same week as judge’s findings.

        • While I guess that Tol is indifferent enough to let such things pass uncommented or even encourages them, this seems somewhat strange to me. Tol himself has pretty consistently argued that the SCC(O2) are positive and CO2 emissions should be taxed. But that is not even the topic here: The gremlin paper’s screw-up is about total impacts, not marginal cost.

          It should be noted tough that even with his (wrong) positive impact in the near future, he made the point that whatever benefit might result is sunk and thus to be disregarded in terms of marginal costs related to CO2 emissions (I think this is wrong, as you have to “integrate and discount” over the curve, so a high discount rate might give you negative SCCs; but note that Tol himself never specified it that way, though I think Heritage did). So, whoever argued for negative SCC was looking in the wrong place here.

          Also, as Andrew pointed out, the single (wrong) “benefit” estimate has a spectacular impact on the curvature of the quadratic-linear damage function. So if anybody thinks that the evil is this (erroneous) benefit, I am not even sure they are right: shouldn’t it be possible that a flattened damage function in a “low impact” or “no impact” scenario for the near future could give you even lower SCC estimates – especially with a low discount rate – than the erroneous Tol curve?

          My point is: Pulling this into the SCC estimate discussion gets you nowhere – this is another topic (total impacts, not marginal costs), Tol argues (and always has argued) for positive SCCs anyway (the problem at least in this regard is the bad company he is keeping), and as far as people think undoing the mess implies higher SCC estimates by making the positive impact disappear, I am not even sure they are right.

          This whole thing is about extreme sloppiness and the unwilligness to admit a grave mistake (to a point where he made everything considerably worse) – aggravated by the fact that this is a topic of enormous policy as well as academic interest and thus created quite some damage in related debates. Tol deserves all kinds of scorn and derision for his behavior, but let’s not lower our own intellectual standards by willfully mixing this up with conceptually different issues.

        • “(the problem at least in this regard is the bad company he is keeping)”
          Sorry, the real point was that in this case there was an interlocking team (that being one of Andrew’s points) that produced outlier results:

          1) Happer arguing that CO2 fertization was “greening the Earth”, and that this provided such great benefits for agriculture as to nullify any downsides AND

          2) Mendelsohn used DICE with parameters that assumed no negative effects until 1.5C or 2.0C temperature rise, at least somewhat based on 1).

          3) And then Tol was asked to run FUND with those parameters.

          All this yielded a set of outlier results.

      • So it does seem that you’re right: in economics, if you present a wacky idea, you’re required to defend it, whereas in social psychology, you can just present any wacky idea as correct, and just about nobody will care.

        I think you may be misinterpreting the Tol versus the Cuddy, Porter & Fiske issue as a disciplinary difference between economics and social psychology.

        I suspect not a lot of people care although if about the Cuddy, Porter & Fiske as they stand (stagger?) though if they really were true they might have some interesting theoretical value.

        Tol, on the other hand, publishes a paper that has immense practical implications on policy on a key global issue, that of climate change

        Thousands of people concerned with climate change are going to scrutinize the paper, not just a few economists but almost anyone with an interest in the policy implications of the paper. It’s a bit like the Bjørn Lomborg situation in miniature.

        • Jkrideau:

          Sure, but isn’t this a typical difference between social psychology and economics? The social psychology research that gets publicized tends to be on trivial topics—or, more precisely, results that would be somewhat interesting if true but is nearly irrelevant if false. In contrast, economics research that gets publicized is often on important topics. Not always—we can consider the example of the freakonomics of sumo wrestling—but usually. So this does seem to be a disciplinary difference, at least on average.

        • I agree that economics research that gets publicized is often on important topics… So this does seem to be a disciplinary difference,

          Well, that may be more a matter of economists being better at self-promotion that those shy, retiring social psychologists?

          There are quite a number of areas that social psychology (and a few other areas of psych) have considerable real world applications but certainly not at the level of national or international policy as some economics research does so good or bad the economics research is more likely to get publicity, in some topic areas.

          I’d suggest that the publicity that goes to trivial social psychology research is not really the fault of social psychology. Some social psychologists may be doing trivial research, well they are, but it is the scientific illiteracy of the press, looking for a quick and cute story that gets the stuff publicized.

          Remember the chocolate study from a year or so ago? At a guess, no reporter actually read the paper and if they did they did not understand the the glaring methodological problems.

          When was the last time you saw a good main-stream media article or radio/TV presentation on the issue of inducing false memories to get false confessions from accused felons? Or on the totally mad use of polygraphs? Both are good social psych subjects but apparently glitzy enough.

          Definitely economics is going to get more publicity but after the Reinhart & Rogoff paper I question just how much more detailed scrutiny economics papers are subjected to unless as in the Tol case the paper is goring someone’s ox and in Tol’s case is was not just a few economists. He gored oxen across the spectrum.

  3. Has anybody written anything critical about the new(ish) NPR morning segment on “social science?” I can’t remember the name of the segment, but it (not surprisingly) specializes in credulity about silly sounding nonsense, generally prefaced with the mandatory, “It turns out that…”. Very cringe-worthy, and I presume, the latest high-dollar real estate that’s fueling this dumb discipline-wide race to the bottom in search of “interesting” findings. I heard a piece a couple of weeks ago about some studies that purported to show that playing golf with “Nike” clubs improved your putting success. When I went to find the published articles to back up such a preposterous seeming claim, all I could find was an extended abstract from a marketing conference that didn’t even include any actual data. When I hear that segment come on the radio now, I use it as a cue to turn off the radio and finish getting ready for work.

  4. It doesn’t exactly help that psychology as a field has a much more complex subject matter than economics. Human cognition and behavior could easily be so convoluted as to routinely require 15 interactions to explain it. Then it becomes very tempting to use this complexity to fill in the gaps in your analysis by story-telling.

  5. Andrew,

    about your Ed Yong bubble: in this (http://www.theatlantic.com/science/archive/2016/04/this-technology-will-allow-anyone-to-sequence-dna-anywhere/479625/) piece about a new pocket DNA sequencer, Yong seems to fall very hard for the hype. I commented on this on FB:

    “The genomics world is almost falling into an orgasmic coma over this new pocket DNA sequencer. It sounds, once again, as if human medicine is going to be revolutionized. But what is the first application scenario science journalist Ed Yong can come up with?

    »If an astronaut could decipher the full genetic code of whatever’s plaguing her, she could identify the offending bug and work out if it’s vulnerable to any drugs.«

    The thing is, not only is the health of astronauts not the most pressing health concern humanity faces, the scenario is also ridiculously inefficient as a general strategy to diagnose their ailments. There’s a reason why we’re not generally diagnosing diseases using genetic sequencing on Earth, so why should this be a better idea in outer space?”

  6. I like this sentence in this post: “These people are psychologists, not statisticians, so maybe we shouldn’t fault them for making some errors in calculation”. That’s gotta hurt!

    Andrew, I just read about six or seven papers in major journals over the last six weeks, in the general area of cognitive psych of the type Mark Seidenberg would consider a field of inquiry he approximately belongs to, and all papers were massively p-value hacked. One common trick we use in psycholinguistics is to compute F1 and F2 scores in repeated measures ANOVA (F1 is by subjects and F2 by items). There is a more conservative estimate called minF. If the minF is significant, we report that. If it is not, we only report the F1 and F2 scores, implying that the effect was significant. A variant is to report minF but not say that it was not significant, just rely on the fact that *some* p-value showed up as less than 0.05. The third recourse, when p>0.05 in all cases, is to use more delicate language to describe the result, and declare victory anyway. This is standard operating procedure. A more recent trick I have seen is to first compute a linear mixed models fit, and if it doesn’t come out significant, back off to repeated measures ANOVA (better chance of getting p<0.05 due to aggregation, see McElreath's comment in his book) and then take the above procedure.

    I also found several t-value calculation errors in the articles of an Advanced ERC-grantee (millions of Euros) linguist.

  7. BTW, I think it is not true that Cuddy’s work is harmless. Someone should fact-check this, but when I looked at an interview of hers, she said she goes to Africa to propagate her message, something like that. If she’s selling the power posing idea there as the cure for underperformance or whatever, that is now entering the territory of inflicting real harm, not to mention climate damage caused by flying her over uselessly to Africa, which should make Tol unhappy too (or not).

    • “Helping him build the _posture_ of a proud horse helped him _become_ a proud horse!” If it weren’t hosted on Amy Cuddy’s site I’d assume that video was a parody from The Onion.

    • Shravan:

      She does seem to be jumping the shark with this one. I’m looking forward to the NYC version of this research in which she explains that roaches can benefit from power poses as well.

  8. “Amy Cuddy is a co-author and principal promoter of the so-called power pose, and she notoriously reacted to an unsuccessful outside replication of that study by going into deep denial.”

    Should readers not be aware of it already, here is another recent (pre-registered) replication: http://spp.sagepub.com/content/early/2016/05/30/1948550616652209.abstract

    “Adopting expansive (vs. contractive) body postures may influence psychological states associated with power. The current experiment sought to replicate and extend research on the power pose effect by adding another manipulation that embodies power—eye gaze. Participants (N = 305) adopted expansive (high power) or contractive (low power) poses while gazing ahead (i.e., dominantly) or down at the ground (i.e., submissively). Afterward, participants played a hypothetical ultimatum game, made a gambling decision, and reported how powerful and in charge they felt. Neither body posture nor eye gaze influenced the gambling decision, and contrary to the predictions, adopting an expansive pose reduced feelings of power. We also found that holding a direct gaze increased the probability of rejecting a low offer on the ultimatum game. We consider why power posing did not have the predicted effects.”

    Maybe it’s time for another TED-talk?

  9. Andrew wrote” “these people are psychologists, not statisticians.” True, but the analysis that Nick is describing — a one-way between-subjects analysis of variance with three groups, followed by multiple group comparisons — is a very simple analysis. It is described in undergraduate psychology statistics books and taught in undergraduate psychology statistics courses, so there really is no excuse for someone with a PhD to screw it up.

    Nick (on his blog) called the multiple comparison t tests “post hoc.” They are “tests after anova” but they are not necessarily post hoc. If Cuddy intended to do them from the beginning based on her theory or hypotheses, then they are a priori.

    I’m puzzled at blog discussions about statistical problems that are either simple computational mistakes or reporting mistakes. It seems to me that it would be better to contact the author(s) about mistakes of this sort and allow them to issue a corrigendum.

    • Carol,
      I already informed the editor of the journal of these issues some months ago. She seems perfectly happy with the corrigendum that corrects the error but that also (bizarrely) claims that the conclusions do not change. Cuddy, Norton, and Fiske completely screwed up the analyses but the editor (Ann Bettencourt) is the one responsible for not correcting the record. I wonder if less well-known authors would have been given the same courtesy.

    • Carol:

      You write, “I’m puzzled at blog discussions about statistical problems that are either simple computational mistakes or reporting mistakes. It seems to me that it would be better to contact the author(s) about mistakes of this sort and allow them to issue a corrigendum.”

      I post on blog because it’s easy. Publishing corrections in a journal can be extremely difficult.

      I suppose I could contact the authors directly. But, given how they’ve responded so far, I can’t imagine this would have any useful effect.

      Ultimately, I don’t care so much about this particular case, and I’d rather use it as a discussion of how rotten things are, to motivate other researchers to do better.

      • Andrew: I agree that it is difficult to get comments published but corrections of mistakes like these are easier (although still not easy).

        The problem is that, in regards to the specific case, the authors are unlikely to see the discussion here, and in regards to the general case, you/we are preaching to the choir. So these discussions are unlikely to have any effect.

        • Carol:

          I doubt that Norton, Cuddy, Fiske, etc. would get much out of reading this blog, as they already have a track record of screwing up their analysis and not caring. If they want to read this all, fine, but I’m more interested in the larger issues. And I disagree with your statement that these discussions are unlikely to have any effect. Lots of young researchers read this blog, and I think it’s good for them to know that there are other paths for them, that they don’t have to follow the noise-mining approach to scientific research, also good for them to realize that there’s a broad research community out there of people who do not respect that kind of work.

          I don’t think it’s a matter of good guys and bad guys. I think that generations of scientists have been trained and rewarded to use bad statistical methods that can produce statistical significance, publication, and fame out of pure noise. It’s important to understand where this is happening, at each stage of the process, from the manipulation of the raw data (as discussed by Nick Brown) to the desperate attempts made by the original researchers to try to preserve their published findings. The “choir” should understand this, also young researchers should understand this, also people who work in other fields such as economics, political science, medicine, etc., should understand what is happening in the field of psychology. Remember, PPNAS is still considered to be a big deal by a lot of people! Some of the readers of this blog are science journalists, and when they come across this post they might think twice about promoting the next PPNAS paper reviewed by Susan T. Fiske. You also have to remember that not every reader reads every post. So even though you might be thoroughly familiar with this particular case and this cast of characters, other readers will be coming across it for the first time. Finally, I think Nick’s work is valuable, and I feel I’m making a contribution by broadcasting it to a larger audience.

        • I want to confirm Andrew’s comments. His criticisms have strongly affected my lab members’ lives, and for the better. A downside is we get rejected now and then from top journals because our papers “don’t provide closure” (I quote an actual editorial comment).

        • I received this comment from an editor just last week: “Pre-registration does not ensure that the best use is made of the data. How a turn away from the garden of forking paths is managed so that we do not walk instead into the garden of lost opportunities has been the piece I have been processing.” Garden of lost opportunities! Priceless.

        • Andrew: I wasn’t criticizing either you or Nick. I appreciate your passion and all the work that you put into this blog. I was just wishing that there were a more effective way to correct simple mistakes and to help psychologists do better research.

  10. Mark: Uh oh. Ann Bettencourt at the University of Missouri? This is someone of whom I have heard nothing good.

    I’ll take a look at the corrigendum. The version of the article that I pulled up earlier did not have a corrigendum attached.

    • It is not up yet but I’ve been told what the text will be. It will apparently read as follows:

      Results on p. 275 should read (changes in bold): “A one-way ANOVA revealed the predicted main effect on this score, F(2,52) = 3.93, p <.03, such that participants rated the high-incompetence elderly person as warmer (M = 7.47, SD = .73) than the low-incompetence (M = 6.85, SD = 1.28) and control (M = 6.59, SD = .87) elderly targets. Paired comparisons supported these findings, that the high-incompetence elderly person was rated as warmer than both the low-incompetence elderly target (marginally) and control elderly target (significantly), t(35) = 1.79, p=.08, and t(34) = 3.29, p<.01, respectively."

      This does not change the conclusion of the paper.

      • Mark: Thanks. I just spotted this on PubPeer also.

        So the authors claim that the change from p < .01 (significant) to a p = .08 (marginal) for the comparison of interest has no affect on the conclusion of the paper. In other words, "marginally significant" is just as good a "significant"? Bizarre.

        Also, with an adjustment for multiple comparisons, that p = .08 would be larger.

      • The statistics from the main effect ANOVA also do not seem to match the reported means and standard deviations; although the result is still significant. Using Nick’s calculations of the sample sizes for each condition, one can compute MSb = 3.686 and MSw = 0.989, so F=MSb/MSw = 3.728, which gives p=0.031. The original paper reported F=3.93.

        A quick check on the numbers does not suggest that this could be a rounding error issue (although maybe I was not creative enough). The correction that mark described, above, only changes the t-statistics, but leaves the reported means and standard deviations unchanged, so there is still a discrepancy. These kinds of errors make it difficult to have much confidence in what is reported.

        • Does any one have the data? Has anyone asked Cuddy for the data?

          A while back I asked Cuddy for some information on a different article and she responded promptly, helpfully, and cordially.

        • We need to change the status quo from having to ask a researcher for data to having them automatically post the data online prior to publication.

          Is that ever going to happen?! Why don’t editors and reviewers put their foot down?

        • I’d just like to note that we do use such a data sharing policy at our open science journals. Authors must publish the data at submission. Not just so that others can check stuff later without having to ask, but so reviewers can do it and verify the analyses.

          For instance, see this thread of a paper (I was a reviewer). http://openpsych.net/forum/showthread.php?tid=275

          I this something like this is how pre-reviewing papers should work.

  11. I think that with this post and some other recent posts about social psych researchers, the more appropriate small, mythical creature that should be invoked is TROLL. Seriously. There is a very fine line between criticism and trolling, and lately it feels like you’ve been falling on the trolling side of that line more often than not.

    I’ve been following this blog for many, many years, long before it became the the well-known blog that it is today. I am completely sympathetic to your views on all the problems in social psych, and psychology in general, and have learned a great deal following the conversation about all the problems in the field. But lately you’ve started to sound more and more like nothing but a self-congratulatory asshole.

    It’s really ironic how much your blog has started to read like the type of tabloid scientific journalism that you criticize so much. You really must have better things to do with your time. Here is one idea: Write a stats book for psychological researchers. If you are so concerned with the methods in the field then do something constructive about it. And saying that you already wrote a book that psychologists are welcome to read and use is a total cop out. The fact is your book is useless to psych researchers. It’s based on software and research problems that very few people in psychology use or can relates to.

    If you want people to adopt new methods you have to make it as easy and accessible as possible for them to do so. You can blame researchers all you want for being too lazy to learn new software or methods but it doesn’t change the basic fact that people are generally resistant to change and are much less likely to take action when faced with hassles like having to learn a new coding language. Kristopher Preacher knows this. That’s why when he proposes a new mediation method he often write macros for software that people in the field use and makes them freely available.

    In the words of your good buddy EJ Wagenmakers, nut up or shut up. Your old man waving a stick at the stupid people across the street routine is getting old and sad.

    • Sentinel:

      You write, “If you want people to adopt new methods you have to make it as easy and accessible as possible for them to do so.” I have done so, in many ways, with many collaborators. We organized the development of Stan. We wrote a book on applied regression and multilevel modeling. We wrote a book on Bayesian statistics. We have recently written two articles for a psychology journal on ways of understanding and resolving problems of forking paths. We wrote the “garden of forking paths” article, which explains to researchers how their work can be subject to multiple comparisons issues, even if they perform only one analysis of their dataset. We’re also writing a stats book for psychological researchers; it just takes time to write a book. And we’re doing lots more. My colleagues and I also are interested in the psychological or sociological problem that people refuse to admit errors, that journals don’t like to publish corrections, etc. I think it’s important to supply useful methods to researchers, and it’s also useful to explore how respected news organizations such as NPR can repeatedly get things wrong. I even think it can be useful to respond to blog comments. I take it that you would prefer that I allocate my work time differently. And this is so important to you that you feel like insulting me about it. All I can say in response is: I’m glad you’re not my boss!

      • It’s great to hear that you are writing a stats book for psychologists! As your new self-proclaimed boss, that’s all I could ask for. Good job! Three gold stars! In all seriousness, this will almost certainly be an important contribution to the literature and go a long way in rescuing the sinking ship that is psychological science.

        The core point is that, for better or worse, pretty much no one in psychology reads your regression book. (By the way, the preferred regression book for many is Cohen & Cohen). No one reads your Bayesian stats book. Pretty much no one in psychology uses Stan. More people are using R, but the numbers are still low compared to SAS and SPSS. You can point to those examples all you want but it doesn’t change the fact that the people who should really be reading your work are the people least likely to. Writing a book for psychologist is how you talk to psychologists.

        • Sentinel:

          I had an idea for writing such a book for a long time, but it was just a couple months ago that the structure of the book came to me. So now it shouldn’t be too hard to write. It’s on line behind about 4 other books, though, so you might have to wait awhile…

          On the plus side, Kruschke read our BDA book and wrote a Bayesian book that some psychology researchers use. It’s possible to have an indirect effect.

        • This fall, I am teaching a course on Bayesian statistics for psychologists. I’m assigning Krushke’s book, but I also plan to teach from BDA. My intent is to use Stan and R. I think Andrew is having both a direct and indirect positive effect on psychology.

        • So, you’ll have a class of about 10 grad students, 15 if you’re lucky who will learn something about Bayesian analysis. Maybe you’ll get really lucky and one of them will actually be from the social psych program! They’ll go back to their labs and perhaps one or two will disobey the advice of their adviser and submit a paper using a Bayes Factor. After numerous rejections of the Bayesian and various other papers they’ll graduate into a bleak job market and somehow manage to land a postdoc where they’ll toil for a year or two doing someone else’s research. After several adjunct positions and a couple of pubs in some mid-tier journals–the only ones willing to accept papers using Bayesian methods that tell honest but unsatisfying stories–they’ll eventually realize that the field is a joke and get jobs at hedge funds or social media mega-corporations analyzing data using all the wonderful methods and programs they learned in Psych 646.

          As Simonsohn would say, it really just does not follow.

        • “Live by your principles!”, says the tenured professor who will never have to look for a job to the former student who can’t find one in her chosen profession.

          The substantive problems with methods and statistics in psychology need to be solved from the top down. You can teach Bayesian stats to psych grad students all you want but as long as they are submitting papers to journals edited by the likes of Fiske, Baumiester, Gilbert, etc, it’s not going to do a whole heck of a lot of good. If you want to change the field, get editorships and institute new policies and standards. Of course, that could be tough given how many people you’ve alienated with your cherry picking and ‘gotcha!’ attacks on other researchers. Good luck with that!

        • Sentinel Chicken has a good point. Most psychology graduate students don’t take more than the few statistics courses required — the standard two-semester introductory graduate statistics course and perhaps a course in multivariate statistics. And the quality of these courses can vary both within psychology department (depending on who the instructor is) and across psychology departments (at different universities). I think the courses are usually better at psychology departments that have a quantitative psychology area/division, but relatively few do. Purdue, where Greg Francis works, does. I don’t think Princeton (where Amy Cuddy received her PhD) does, though.

          And then one must remember that the pool of reviewers is drawn from the same group of people as the authors.

    • >”The fact is your book is useless to psych researchers. It’s based on software and research problems that very few people in psychology use or can relates to.”

      Isn’t the underlying problem that psych researchers design all their research around doing NHST? That needs to change first before they have any use for Python/R, parameter estimation, the scientific method, etc. It is true that there is little need for STAN if the ultimate goal is to estimate the means of two groups.

      However, if a well-respected and highly cited pyschology researcher like Meehl had no effect despite harping on this for 30 years, I doubt Andrew can get psychologists to stop doing it no matter how much effort he puts forth. To be fair, the presence of the internet could help a lot, so maybe that is too pessimistic.

      • Anon:

        Yes, the Meehl thing is disheartening. But I’m hoping that the combination of publicity and new research tools will help move things forward. The Fiskes, Nortons, and Cuddys of the world are not going away, but at least maybe we can supply some better models for future social science researchers.

        • For any established field like psychology progress is going to always be incremental but does not mean that there is no reason to occasionally administer a kick to hurry the progress along a bit. I can see how some see discussions like these as being “cherry-picking” and “gotcha attacks” but examples are a wonderful to communicate to an audience that is less sophisticated than we might hope for. There must also be negative consequences for some of the shameful behavior we see from researchers who draw substantial salaries from often publicly funded universities and who, without any sense of irony, describe themselves as “scientists” while refusing to behave in a way that is concordance with the spirit of science.

          I wonder if sentinel chicken can comment on how he/she thinks that we should act when we observe the kind of behavior that the Cuddy’s and Fiske’s of this world engage in. Many of us have engaged authors and editors in discussions, often at great personal and professional cost to ourselves, without any real changes being made to the literature. The paper that led to this thread is a great example. The paper has been cited over 500 times, the authors admit a major mistake but refuse to change their interpretation of their data and the journal is going along with it. What should we do now? The Committee on Publication Ethics is a joke, the institutions won’t act if the journals don’t do something and in the meantime researchers continue to read papers like this and base their own research activities on the findings reported in this paper.
          Is this not a massive problem? Would we tolerate this in another field, say medicine?

        • “The paper that led to this thread is a great example. The paper has been cited over 500 times, the authors admit a major mistake but refuse to change their interpretation of their data and the journal is going along with it. What should we do now?”

          What should we do now?

          Blame the incentives, always blame the incentives.

          Thank goodness for A. K. Bones (http://www.projectimplicit.net/arina/B2012.pdf):

          “Finally, a skeptic might counter that the JPSP authors could have conducted the studies, found results, dismissed inconsistent data, and then written the paper as if those were the results that they had anticipated all along.
          However, orchestrating such a large-scale hoax would require the coordination and involvement of thousands of researchers, reviewers, and editors. Researchers would have to selectively report those that “worked.” Reviewers and editors would have to selectively accept positive, confirmatory results and reject any norm violating researchers that submitted negative results. The possibility that an entire field could be perpetrating such a scam is so counter-intuitive that only a social psychologist could predict it if it were actually true”.

        • mark:

          Medicine, unfortunately is not much better and when it is I believe is due to regulatory laws and actions or medical bodies occasionally shining a light on a problem.

          The Cochran Collaboration _has_ make do with their risk of bias tool – similar to treating machine gun shooting victims with bandages of this sort https://www.band-aid.com/products/for-kids

        • I’d suggest that we should act like firefighters instead of acting like cops. Right now, the Gotcha Gang patrols the streets of psychology land looking for law breakers to make an example out of. They accuse people of wrong doing and beat them down in the street in front of their community members, as they see fit. This approach is only going to make people angry, defensive and cling to their community identity that much more. Why is anyone surprised when Cuddy, Bargh, or Gilbert react defensively and stand their ground? THIS is pretty basic psychology. Unfortunately, these basic principles are lost on the Gotcha Gang, who are psychologists in name but not in practice.

          An alternative approach would be to conceptualize the problem as a burning building filled with lots of people who need to be helped and saved. In this model, we are less concerned with who started the fire, why they didn’t see it coming and why they didn’t put it out and more concerned with helping people get out alive so that we can learn from our mistakes and build a new building. The key to this approach is understanding that many of the people now trapped in the building are victims. Cuddy is just doing research the way she was taught and *rewarded* for doing research. Psychology, like many academic disciplines, is basically a cult. You conform or get kicked out. (Maybe Andrew can talk about his experience with the cult of statistics at Berkeley.) We can look back at all the mistakes people have made by conforming to corrupt norms, but that’s just blaming the victim. What’s done is done. Perspective taking and empathy combined with truth and reconciliation are the way forward. Hanging people in the town square will only sew seeds of anger and bitterness that fester and prolong this horrible situation.

        • I don’t think that these people are victims. NHST is silly but I cannot believe that she was ever taught that it is okay to take a p-value of .052 and claim that it was <.05 or that claiming support for a finding when the correct analyses finds no support is okay. Similarly, I don't think that the Stapels of this world and all those who have clearly fabricated fit indexes for the models in the management literature are victims.
          Perhaps we should think of them as arsonists – they are not simply setting fire to their own homes but by their example are also burning down entire fields.

          In the past when I've spotted errors in papers I've jumped through all the hoops: contact the authors to make them aware of the issues and, then contact the editors. My reward for this has been threats of lawsuits, defamation, threats, and attempted bribes with almost no action by editors or the responsible authors. Are you seriously asking me to engage in perspective taking and empathy with these people?
          Why are we letting people get away with what is often research fraud? Honest researchers whose work does not end up in Psych Science and who don't get grant money thrown their way because they don't report staggering effects are hurt by this and public money gets squandered on this stuff. That is, these are not private mistakes that only hurt a narrow few, and right now there is almost no downside for misrepresenting your data. There is only a very low chance that you'll get caught, and a very low chance that a journal will do anything to correct the record even when you are caught. On the upside you get publications, tenure, grants, and book contracts. We need some disincentives for this behavior and a little public shaming seems entirely appropriate.

        • Mark: I doubt that she was *taught* to do the things you mentioned. But remember that faculty members serve as role models for graduate students. She may have followed what her mentors were doing.

          Case in point: some time ago Nick and I wrote a comment pointing out the p-hacking in an article. The article was based on a graduate student’s master’s thesis, which we were able to obtain. The master’s thesis was a bit muddled, as master’s theses tend to be, but the graduate student seemed to be trying to do a good, honest job. The published article, co-authored with the graduate student’s mentor, clearly was p-hacked. Our guess was that this p-hacking was suggested by the mentor, not by the reviewers/editor.

        • “I cannot believe that she was ever taught that it is okay…”
          The rounding of p-values and then interpretation at the rounded value seems to be an actual standard for some journals and is in fact taught.

          Informal specification search is also definitely taught. I took the stats series for Psych PhD students at a top school and trying out various “plausibly causal” models was definitely taught circa 2009.

          On the other hand, the disastrous consequences of underpowered studies, optional stopping rules, and multiple testing/modeling were largely not taught.

        • Thanks, Carol. You are SPOT ON. Grad school is an apprenticeship system. Most teaching and learning happens informally through observation of practice. No one is explicitly or formally taught to p-hack. It happens through the process of turning a thesis or dissertation into a publishable paper, talking about studies in lab meetings and so forth. Like most evil, it comes wrapped in the shroud of the banal.

          Mark: Here is an opportunity for you to practice your perspective taking skills. Imagine being that graduate student in the situation Carol described above. Do you follow your adviser’s guidance and ‘clean’ this up here and ‘tighten’ that up there to get the thesis ‘in shape’ for a journal, making her proud and happy? Or do you refuse, questioning your adviser’s judgment, potentially souring one of the most important relationships in your academic career, the relationship that could determine your future as researcher? Of course, you would refuse to engage in such unethical practices because you are clearly a much better and morally fortified person than most academic psychologists. Unfortunately, most people are not so strong and clear-eyed as you. Does that mean they are bad people, deserving of ridicule and derision? Or are they just normal people doing the best they can in a bad situation?

        • Sentinel, I was in that situation as a student and I did refuse the faculty member. QPRs are rampant in the literature and I agree that most of these are the result of poor training and modeling of advisor behavior. These we should best counter by education rather than by attacks on individuals – I agree with you there. Other behaviors rise to another level of dishonesty and I get particularly riled up when the offenders are the ones who hold themselves up as a model of how scientists should conduct themselves.

        • Sentinel Chicken and Mark: I congratulate you, Mark, for standing up to your advisor. But my observation is that most graduate students don’t. In fact, many of them do not recognize that what their advisors are telling them to do is wrong. They simply trust in the experience and expertise of the advisor.

        • A few thoughts from the trenches.

          1) the state of quantitative training in psychology is pretty limited.

          “Most training supported laboratory and not field research. The median of 1.6 years of training in statistics and measurement was mainly devoted to the modally 1-year introductory statistics course, leaving little room for advanced study. Curricular enhancements were noted in statistics and to a minor degree in measurement. Additional coverage of both fundamental and innovative quantitative methodology is needed. The research design curriculum has largely stagnated, a cause for great concern. Elite programs showed no overall advantage in quantitative training. Forces that support curricular innovation are characterized. Human capital challenges to quantitative training, including recruiting and supporting young quantitative faculty, are discussed. Steps must be taken to bring innovations in quantitative methodology into the curriculum of PhD programs in psychology.”

          Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: replication and extension of Aiken, West, Sechrest, and Reno’s (1990) survey of PhD programs in North America. American Psychologist, 63(1), 32. http://www.ncbi.nlm.nih.gov/pubmed/18193979

          2) Even having the benefit of quantitative training from the Aiken, West, and Millsap my experience dealing with faculty while a graduate student is mixed. Some faculty were receptive to graduate students importing what they learned in quantitative methods and wanted to learn with the studen, while others were suspicious of bootstrapping and others still only wanted results presented as ANOVAs.

          3) On the bright side, having substantive faculty and quantitative faculty working in the same department can alleviate some of the tension between a graduate student and advisor by forcing the advisor to deal with a peer rather than a peer in potentia.

          4) Lastly, I know of too many psychology PhDs who are all too willing to act as experts when they are not. Outside of the simplest experimental designs that are genuinely amenable to ANOVA, statistical analysis is much harder than many PhDs appreciate. This “modally 1-year introductory statistics course” typically involves one semester of ANOVA and then one semester of Multiple Regression with the big reveal that they are unified by the general linear model under the hood, so it’s a-ok to analyze experimental data with Multiple Regression.

        • Sentinel Chicken, I like your burning building analogy, but I think you have mischaracterized the various roles. The researchers “trapped” in the burning building are asleep. What you call the “Gotcha Gang” is (in my view) akin to a fire alarm. The defensive reactions from some researchers are people who (perhaps understandably) do not realize there is a fire and just want to go back to sleep.

          To continue the analogy, the unpleasant current situation is that the fire alarm is going off, but some people are sleeping through it; and many people who have woken up do not know what to do.

        • To harken back to another recent post, community policing and the use of informal social control to change norms is really the key. Instead of responding to crimes or fires, try to get people thinking differently (stop smoking in bed, label flammable materials) or even do harm reduction (slow burning sheet rock, smoke detectors, find my iPhone, reduce lethality of violent encounters ) and make other changes that make crime or fires less likely to happen in the first place. But really I think the thing is to change norms and be aware that a lot of people really have not had great training.

        • Also need change in stat courses – some good ideas here from Deborah Nolan https://mediastream.microsoft.com/events/2016/1606/User2016/player/User2016.html (at 1:16)

          > no matter how much effort
          Change in stat courses is another insurmountable opportunity given current university context and faculty and their careers.

          Terry Speed once said that about 5% of faculty teaching stats actually had hands on experience analyzing real data from scratch.

          Today, I am guessing that it is higher than 5% but I am unsure how much higher.

          Insurmountable but very worthwhile opportunity.

        • I really have been thinking about the fact that a lot of basic stats courses are still modeled on the courses that my dissertation advisors would have taken in the late 1950s to 1970s. I so much want a different course, but a lot of book publishers have told me that there is a basic issue that people don’t want to re-do their courses so they don’t want to change.

        • > don’t want to re-do … don’t want to change.
          Agree, inertia reinforced by university expectations/constraints.

          Similarly, once someone has figured out and succeeded at publishing a few gremlin papers they likely don’t want to change nor learn about how to do better (a lot of work) as they are doing well enough (and may do worse in tangible ways).

          We have to help them avoid bad habits early in their careers.

    • But why stop at providing “macros” for lazy psychologists who want to use statistical methods but find it distressing to have to learn anything about them? I guess the psychologists should also be provided software that generates the predictions from various theories in their field so that they don’t need to master the literature. In fact, in the old days there used to be an automated Chomsky text generator on the internet that wrote Chomskyan articles by connecting together words that were syntactically correct; why not also provide a software to save the psychologist from learning to write coherently? We could write some macros such that you type in some data and design details into an SPSS like interface, and the paper is generated automatically, maybe with some minimal post-processing.

      For linguistics, I have been thinking of a master’s thesis project for writing software to generate predictions from syntactic theories, so that linguistics students can just plug in sentences and get a syntactic analysis printed out without knowing anything about the theory. I mean, I don’t really need to know the internal structure of a CP. I think that Stefan Mueller has something like this for HPSG already at Humboldt Uni Berlin (although his goal is not to make syntactic knowledge redundant; he’s just implementing grammars in HPSG). We could mainstream this approach, making syntax courses redundant in linguistics programs, or maybe we could just teach one or two courses that show the student how to use the software, i.e., which button to press. The time saved from not teaching syntax could be redirected to writing similar software packages for semantic analyses, and so on. Eventually we could market a super-SPSS type program to do analyses automatically: just choose your field (Psychology or Linguistics) from the dropdown menu, then choose sub area (Working Memory, Motivational Theory, Syntax, Semantics, Morphology, etc.), plug in the data you want analyzed, and enjoy.

  12. Whenever I hear the phrase ‘cargo cult science,’ I go back and re-read Feynman’s 1974 commencement address to Caltech grads. Not surprisingly, he uses psychology research in his examples, but the chief message is: “Learn how not to fool yourself!”

    When I end up on a faculty somewhere, this will be required reading for my students (that and Casella-Berger)

    http://calteches.library.caltech.edu/51/2/CargoCult.htm

  13. As a clinical psychologist (aka works for a living providing a service to people and not a true academic), the preponderance of psychological nonsense studies really concerns me. For clinical work we need to refer back to neuroscience, psychiatry and other clinical sources. The combination of these plus the liability clinical psychologists hold against providing non-empirically determined treatments holds us (reasonably) accountable to ‘get it right’. Academics (and I have taught at a tertiary level) not only appear to have no accountability for the equivalent of ’empirically validated’, meaning multiple trials, review by other professions and experts. But they are viewed oftentimes as the people who hold the keys to the kingdom of research (come to one of our conferences!).

    Statistical ignorance is no excuse — that is what a consultant is for. I think, in addition to the many ideas regularly noted, that the lack of a specific model of prediction lies at the heart of these stupid errors. Measuring gravity requires a model of physics and if the data really do not conform to the model, the physics community will either require better data or (almost never) the model will change. Psychology is no different. In nearly all of the papers critiqued in this blog, there has not been a model and by definition not really a theory of any substance (IMHO). A (relatively trivial example) is Cogntivie Behaviour Therapy. The model (CBT) predicts that thoughts (that oen is unloved) will elicit negative emotion (sadness) and therefore behaviour (reducing social contact). If the data don’t fit on a measure of negative emotion or of thoughts, the first suspect is the measurement itself. Even if it does, this does not prove that the underlying construct is correct. Psychological research can and does work; it just is a slow process with a lot of blind alleys.

  14. Sentinel chicken: I think you are being too pessimistic about the ability of psychologists to change. The past five years have seen a veritable academic revolution (see Bobby Spellman’s summary). Journals are changing their policies, institutes such as the Center for Open Science are starting to direct the discussion, replication attempts are published at a higher rate, and especially young researchers are keen to improve on methodology — after all, they are the ones who are planning to spend a life time in the field.

    As far as Andrew’s blog is concerned, I know for a fact that it is having a positive impact. Journal editors do read the posts, or get attended to them. Young people in the field also read Andrew’s posts. It is true that few psychologists report Bayesian analyses, but there the momentum is shifting — five years ago, many psychologists would not know what “Bayesian analysis” means, whereas now at least they realize it is something that they ought to read up on. I do agree that (unfortunately perhaps) the material needs to be made more accessible, SPSS-style [this is what we seek to accomplish with JASP, jasp-stats.org. We are in the process of writing manuals and course books]. At the University of Amsterdam, I teach Bayesian statistics to our undergraduates (several hundred of them) and they respond very positively.

    More generally, psychologists have a moral responsibility to act. You may recall Neuroskeptic’s 2012 article “The Nine Circles of Scientific Hell” (http://pps.sagepub.com/content/7/6/643.short?rss=1&ssource=mfr). The first circle is limbo. I quote in full: “The uppermost circle is not a place of punishment so much as regret. Those who have committed no scientific sins per se, but who have turned a blind eye to them, or encouraged sinners through the awarding of grants, spend eternity on top of this barren mountain, watching the carnage below and reflecting on how they are partially responsible…”

    Will you go to scientific hell, sentinel chicken?

    Cheers,
    E.J.

    • EJ: Yes, I agree that things have improved. It used to be next to impossible to get a commentary published. I have a file drawer full of rejected commentaries stretching back over many years. It was also next to impossible to obtain someone’s data set. Now I sometimes — not always — get my commentaries published, and sometimes — not always — obtain data sets.

      On the other hand, a few years ago I was working in the psychology department at a well-regarded research university. One of the professors in clinical psychology complained to the professors in quantitative psychology that graduate students who had completed the required two-semester introductory graduate statistics course were unable analyze data. To me, this would have been a serious concern. But the professors in quantitative psychology (who teach this course) just brushed off the professor in clinical psychology, who became persona non grata. This was a missed opportunity for the professors in the substantive psychology areas and the professors in quantitative psychology to work together to design a good, useful statistics course. Too bad!

    • E.J. Thanks, the Neuroskeptic’s 2012 article made my day.

      It is my impression as well that there have been recent dramatic changes in awareness and concern.

      Maybe starting a bit more that 5 years ago though – using these personal markers

      http://statmodeling.stat.columbia.edu/2007/07/09/how_should_unpr/#comment-42903

      http://statmodeling.stat.columbia.edu/2008/07/28/an_ounce_of_rep/#comment-45916

      My guess is that sentinel chicken has not had much experience reforming researchers – I think a better analogy would be a safe injection site for drug addicts (motivated to make it safer for both addicts and the population at large) – perhaps researchers career paths could be guaranteed to continue on the same path, they do continue to get funding and do new research, but it is embargoed from anyone but journal referees, editors for 20 years?

  15. Cuddy’s next project is apparently going to be about bullies. Chapter 1 will probably be called: The Disturbing Case of Andrew Gelman, Statistical Troll Extraordinaire.

    • Shravan:

      Maybe she can write this book jointly with fellow Harvard refugee Mark “Evilicious” Hauser, who reports that he left the ivory tower to work with at-risk teens. In all seriousness, though, I hope she doesn’t train her guns on me. Enough people hate me already; the last thing I need is a million NPR listeners thinking I’m a bad guy. Couldn’t Cuddy just attack Ranehill et al. instead? They’re the ones whose unsuccessful replication was the breath of fresh air that sent the whole house of cards to collapse.

    • I’ve wondered whether some of Amy Cuddy’s mistakes are due to the fact that she suffered severe head trauma as the result of a car accident some years ago.

        • No, they didn’t, Andrew. In any case, I put more blame on Susan Fiske than on the other two. Fiske was the professor and the work was done under her auspices. Amy Cuddy was a graduate student, and Michael Norton was a post doc, at the time that this article was written.

          In general, I am inclined to be helpful to graduate students, post docs, and junior faculty, rather than critical.

        • Carol:

          I’m inclined to be helpful to all these people. Sometimes criticism is the best way of being helpful. When I make a mistake and someone points it out, I realize they’re doing me a favor and I thank them. I think it would behoove Cuddy etc. to behave this way too, instead of going to such ridiculous efforts to hold onto their theories.

        • Andrew: Well, you are a marvel and most researchers don’t behave the way that you do. Yes, I agree that Cuddy and all the rest should admit their mistakes. In a recent presentation to the British Psychological Society (it’s on YouTube for those who are interested; look for “Replicability and Reproducibility Debate”), Nick blamed incentives. I agree that incentives are part of the problem.

  16. Oof, lots of critique going towards Andrew. Not sure if deserved – this is his blog while these researchers published crap by most measures and made money (my take: it feels as though capitalism is prying at the edges of academia, re: is science really a salable skill set?) As an undergraduate, these lessons are valuable as I study Sheldon Ross and ET Jaynes.

  17. > all that matters is that there’s something, somewhere, that has p less than .05, because that’s enough to make publishable

    Haha that’s great, I think it could be formulated as a deductive rule:

    Modus Psychologens:
    Claim: If a particular H_0 can be rejected, then this is support for theory X
    Result: something, somewhere has p less than .05
    ————————————————
    Conclusion: theory X is proven

  18. We’re coming up on the 25th Anniversary of the U.S. Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals and I’m writing a paper for a legal rag about it; the focus being on the sequelae of the Court’s embrace of peer review, statistical methods, low rates of methodological “error” and general acceptance in the relevant scientific community as indicia of “reliable” science. In my walk through the garden of forking law review articles on Daubert and statistical methods I came across one co-authored by Susan Fiske.

    At first I sort of liked it: “At a minimum, as suggested by the Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals,” the law should not incorporate a psychological theory into one of its many doctrines unless it has been empirically tested, has been subject to peer review and publication, has garnered widespread acceptance within the relevant scientific community, and, where applicable, has a known and acceptable error rate. In connection with factual adjudication, a trial court, applying Daubert, plays a gatekeeping role; it screens proffered scientific evidence to determine whether it satisfies applicable scientific standards, and whether it would be useful to the trier of fact.”

    But then came the p-value talk: “An observed difference or relationship among variables is said to be “statistically significant” if the odds of that difference or relationship occurring by chance are less than 5% (in which case it is significant at the .05 level) or less than 1% (in which case it is significant at the .01 level).”; and praise for the wonders of meta-analysis: “Fortunately, through the use of meta-analysis, which allows the quantitative summary of results across multiple studies, social scientists can sometimes identify field effects that are larger than their laboratory-bound counterparts.” (File drawers FTW). At least: “Empirical social psychologists, unlike judges and litigants, have powerful incentives to challenge the established theoretical models that constitute their discipline”. I was of course sad to learn I’m trapped in a faulty paradigm lacking any incentives needed to change it.

    Anyway, while I certainly praise her cause (which has been discovering and rooting out (implicit) anti-female bias in the workplace) and don’t begrudge her her expert witness consulting fees I do wonder if the p-value crystal ball through which she discerned the hidden cognitive biases in others on which she now opines might have had some unsuspected refractive/reflective defect. https://scholarship.law.berkeley.edu/cgi/viewcontent.cgi?article=1252&context=californialawreview

    And as for where the courts are today, more than 1.5 yrs after the ASA statement on p-values, a Federal Court a couple of weeks back wrote this in a footnote about how p ought to be interpreted: “Probability levels (also called “p-values”) are simply the probability that the observed disparity is random—the result of chance fluctuation or distribution. For example, a 0.05 probability level means that one would expect to see the observed disparity occur by chance only one time in twenty cases—there is only a five percent chance that the disparity is random.” Oh well.

    • Thanatos:

      Regarding the quote in your last paragraph: There is a problem when people get used to making pronouncements without being criticized. I can well imagine this would happen with judges: Everyone treats them like experts on all topics, and they start to believe the flattery.

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *