Remember that “gremlins” paper by environmental economist Richard Tol? The one that had almost as many errors as data points? The one where, each time a correction was issued, more problems would spring up? (I’d say “hydra-like” but I’d rather not mix my mythical-beast metaphors.)
Well, we’ve got another one. This time, nothing to do with the environment or economics; rather, it’s from some familiar names in social psychology.
Nick Brown tells the story:
For an assortment of reasons, I [Brown] found myself reading this article one day: This Old Stereotype: The Pervasiveness and Persistence of the Elderly Stereotype by Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske (Journal of Social Issues, 2005). . . .
This paper was just riddled through with errors. First off, its main claims were supported by t statistics of 5.03 and 11.14 . . . ummmmm, upon recalculation the values were actually 1.8 and 3.3. So one of the claim wasn’t even “statistically significant” (thus, under the rules, was unpublishable).
But that wasn’t the worst of it. It turns out that some of the numbers reported in that paper just couldn’t have been correct. It’s possible that the authors were doing some calculations wrong, for example by incorrectly rounding intermediate quantities. Rounding error doesn’t sound like such a big deal, but it can supply a useful set of “degrees of freedom” to allow researchers to get the results they want, out of data that aren’t readily cooperating.
Here’s how Brown puts it:
To summarise, either:
/a/ Both of the t statistics, both of the p values, and one of the dfs in the sentence about paired comparisons is wrong;
/b/ “only” the t statistics and p values in that sentence are wrong, and the means on which they are based are wrong.
And yet, the sentence about paired comparisons is pretty much the only evidence for the authors’ purported effect. Try removing that sentence from the Results section and see if you’re impressed by their findings, especially if you know that the means that went into the first ANOVA are possibly wrong too.
OK, everybody makes mistakes. These people are psychologists, not statisticians, so maybe we shouldn’t fault them for making some errors in calculation, working as they were in a pre-Markdown era.
The way that this falls into “gremlins” territory is how the mistakes fit together: The claims in this paper are part of an open-ended theory that can explain just about any result, any interaction in any direction. Publication’s all about finding something statistically significant and wrapping it in a story. So if it’s not one thing that’s significant, it’s something else.
And that’s why the authors’ claim that fixing the errors “does not change the conclusion of the paper” is both ridiculous and all too true. It’s ridiculous because one of the key claims is entirely based on a statistically significant p-value that is no longer there. But the claim is true because the real “conclusion of the paper” doesn’t depend on any of its details—all that matters is that there’s something, somewhere, that has p less than .05, because that’s enough to make publishable, promotable claims about “the pervasiveness and persistence of the elderly stereotype” or whatever else they want to publish that day.
As with Richard Tol’s notorious paper, the gremlins feed upon themselves, as each revelation of error reveals the rot beneath the original analysis, and when the authors protest that none of the errors really matter, it makes you realize that, in these projects, the data hardly matter at all.
We’ve encountered all three of these authors before.
Amy Cuddy is a co-author and principal promoter of the so-called power pose, and she notoriously reacted to an unsuccessful outside replication of that study by going into deep denial. The power pose papers were based on “p less than .05” comparisons constructed from analyses with many forking paths, including various miscalculations which brought some p-values below that magic cutoff.
Michael Norton is a coauthor of that horrible air-rage paper that got so much press a few months ago, and even appeared on NPR. It was in a discussion thread on that air-rage paper that the problems of the Cuddy, Norton, and Fiske paper came out. Norton also is on record recommending that you buy bullfight tickets for that “dream vacation in Spain.” (When I mocked Norton and his coauthor for sending people to bullfights, a commenter mocked me right back by recommending “a ticket to a factory farm slaughterhouse” instead. I had to admit that this would be an even worse vacation destination!)
And, as an extra bonus, when I just googled Michael Norton, I came across this radio show in which Norton plugs “tech giant Peter Diamandis,” who’s famous in these parts for promulgating one of the worst graphs we’ve ever seen. These people are all connected. I keep expecting to come across Ed Wegman or Marc Hauser.
Finally, Susan Fiske seems to have been doing her very best to wreck the reputation of the prestigious Proceedings of the National Academy of Sciences (PPNAS) by publishing papers on himmicanes, power pose, and “People search for meaning when they approach a new decade in chronological age.” In googling Fiske, I was amused to come across this press release entitled, “Scientists Seen as Competent But Not Trusted by Americans.”
A whole fleet of gremlins
This is really bad. We have interlocking research teams making fundamental statistical errors over and over again, publishing bad work in well-respected journals, promoting bad work in the news media. Really the best thing you can say about this work is maybe it’s harmless because no relevant policymaker will take the claims about himmicanes seriously, no airline executive or transportation regulator would be foolish enough to believe the claims from those air rage regressions, and, hey, even if power pose doesn’t work, it’s not hurting anybody, right? On the other hand, those of us who really do care about social psychology are concerned about the resources and attention that are devoted to this sort of cargo-cult science. And, as a statistician, I feel disgust at a purely aesthetic level to these fundamental errors of inference. Wrapping it all up is the attitudes of certainty and defensiveness exhibited by the authors and editors of these papers, never wanting to admit that they could be wrong and continuing to promote and promote and promote their mistakes.
A whole fleet of gremlins, indeed. In some ways, Richard Tol is more impressive in that he can do it all on its own, and these psychology researchers work in teams. But the end result is the same. Error piled upon error piled upon error piled on refusal to admit that their conclusions could be completely mistaken.
P.S. Look. I’m not saying these are bad people. I’m guessing that from their point of view, they’re doing science, they have good theories, their data support their theories, and “p less than .05” is just a silly rule they have to follow, a bit of paperwork that needs to be stamped on their findings to get them published. Sure, maybe they cut corners here or there, or make some mistakes, but those are all technicalities—at least, that’s how I’m guessing they’re thinking. For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction. So that’s the real point of this post: the incentives. Forget about these three particular professionals, and consider the larger problem, which is that errors get published and promoted and hyped and Gladwell’d and Freakonomics’d and NPR’d, whereas when Nick Brown and his colleagues do the grubby work of checking the details, you barely hear about it. That bugs me, hence this post.
P.P.S. Putting this in perspective, this is about the mildest bit of scientific misconduct out there. No suppression of data on side effects from dangerous drugs, no million-dollar payoffs, no $228,364.83 in missing funds, no dangerous policy implications, no mistreatment of cancer patients, no monkeys harmed by any of these experiments. It’s just bad statistics and bad science, simple as that. Really the worst thing about it is the way in which respected institutions such as the Association for Psychological Science, National Academy of Sciences, and National Public Radio have been sucked into this mess.