This April Fools post is dead serious

Usually for April 1st I schedule a joke post, something like: Why I don’t like Bayesian statistics, or Enough with the replication police, or Why tables are really much better than graphs, or Move along, nothing to see here, or A randomized trial of the set-point diet, etc.

But today I have something so ridiculous that it made sense to just post it straight up.

It came up a few days ago, when I was googling the name of a researcher who, with a colleague, had published two papers that were near exact duplicates, two years apart and in the very same journal. It turns out this researcher has had various data problems with his published work (see here from Retraction Watch and here from Malte Elson, a story of a ridiculously drawn-out story of bad data) and, according to Elson, is “one of the most frequent users of the Competitive Reaction Time Task,” a true nest of forking paths (this last bit is relevant to understanding how this researcher, and others like him, manage to consistently find stunning, statistically-significant and publishable findings from their data).

But that’s all background. What happened was that I was googling this guy and came up with what may possibly be the most ridiculous scientific article I’ve ever seen.

The title is “Low glucose relates to greater aggression in married couples,” but things really get going in the abstract:

People are often the most aggressive against the people to whom they are closest—intimate partners. Intimate partner violence might be partly a result of poor self-control. Self-control of aggressive impulses requires energy, and much of this energy is provided by glucose derived from the food we eat. We measured glucose levels in 107 married couples over 21 days. To measure aggressive impulses, participants stuck 0–51 pins into a voodoo doll that represented their spouse each night, depending how angry they were with their spouse. To measure aggression, participants blasted their spouse with loud noise through headphones. Participants who had lower glucose levels stuck more pins into the voodoo doll and blasted their spouse with louder and longer noise blasts.

Sticking 0-51 pins into a voodoo doll, huh? I could see sticking 1 or 2 pins into the doll, but 51?! That’s a bit outta control, no? Is it a voodoo doll or a pincushion?

The paper carefully follows Rolf Zwaan’s 18 rules for writing a successful PNAS paper, even going to the trouble of leading off with a celebrity quote (#12 on Zwaan’s list).

I still can’t believe there were people who’d go to the trouble of sticking 51 pins into a voodoo doll. 51, that’s such a high number—where did it come from? What the heck, why not go all the way up to 100?

Also this bit:

To measure aggression, participants competed against their spouse on a 25-trial task in which the winner blasted the loser with loud noise through headphones.

Whaaa?

OK, here are some further details:

Participants were told that they would compete with their spouse to see who could press a button faster when a target square turned red on the computer, and that the winner on each trial could blast the loser with loud noise through headphones. The noise was a mixture of sounds that most people hate (e.g., fingernails scratching on a chalkboard, dentist drills, ambulance sirens). The noise levels ranged from level 1 (60 dB) to level 10 (105 dB; approximately the same level as a fire alarm). The winner could also determine the duration of the loser’s suffering by controlling the noise duration [from level 1 (0.5 s) to level 10 (5 s)].

Wow, that sounds like a fun game.

The “voodoo doll,” thing still seems like the weirdest part. But . . .

Previous research has shown that this procedure is a valid way to measure aggressive inclinations in couples (17).

OK, let’s look up the reference:

17. Dewall CN, et al. (2013) The voodoo doll task: Introducing and validating a novel method for studying aggressive inclinations. Aggress Behav 39(6):419–439.

“Aggress Behav,” indeed. I still can’t figure out how they came up with the number 51. This just seems like a lot of pins to me. What with the pins and the blasting of loud noise, it’s kind of amazing these people are still married!

I was talking about the “Low glucose relates to greater aggression in married couples” paper with someone I know who does social work research, and she assured me that it must be some sort of April Fool’s joke: the voodoo dolls, the story about the glucose, the trivialization of the serious problem of intimate partner violence. She assumed this was all a parody of silly psychology research.

So I checked some more and, no, the paper seems to be real. For example, here’s a press release dated April 14, 2014, from Ohio State University, which includes the following image:

So I think the study really happened! The press release also featured this quote from one of the authors of the study:

“It’s simple advice but it works: Before you have a difficult conversation with your spouse, make sure you’re not hungry.”

You probably don’t need me to tell you this, but . . . the paper had no data at all on conversations, let alone “difficult conversations,” nor was there any data on hunger, or any evidence that any intervention “works.”

So, par for the course: a one sentence claim that includes 3 different claims, none of which are supported by data.

The study was also featured uncritically by NPR. Of course. No preregistered replications that I’ve seen, but, hey, that’s not a problem in the field of ego depletion, right? Right?

Voodoo correlations, indeed.

P.S. One interesting question is why it is that various problems go together: In this case we have duplicate publications, disregard of the welfare of students, reluctance to share data, p-values obtained via forking paths, NPR-bait research published in PNAS, ridiculous measurements, the claim that one simple trick can change your life, and a set of specific claims that are not addressed in any way by the published research.

There perhaps are some logical reasons for this co-morbidity.

Let’s work backward. To get NPR-bait research published in PNAS, you need some combination of (a) originality and (b) major claims, along with (c) statistical significance or the equivalent. (We actually saw a PNAS paper recently that got by on a “p less than 0.10” result that went in the opposite direction as the preregistered hypothesis, but that’s unusual; I still can’t figure out how that one got through.)

So here’s the problem:
(a) Originality is tough. It’s hard to come up with original ideas, and the easiest way to do so is to go wacky (voodoo dolls)!.
(b) If your ideas are original, they’re unlikely to work the first time, or even the second or third. Hence the need to massage the data, which selects for unethical behavior (hence the possible correlation with duplicate publication, disregard for the welfare of students, reluctance to share data, and general suppression of dissent).
(c) And the easiest way to get statistical significance is to keep shaking your data till something comes up, then cover your tracks with story time.

That pizzagate guy was just the most extreme example.

On the other hand, I don’t really know how much the above behaviors go together in general. I’ve never done anything like a systematic or representative survey of research misconduct, so these are all speculations. Also, I’m making no claim that any of the authors of the above-discussed paper have engaged in unethical behavior. I have no idea. They may just have all been in the wrong place at the wrong time. Nor am I saying that PNAS should not be publishing a paper on voodoo dolls. It’s their call: PNAS gets to publish the paper, Ohio State NPR gets to publicize them, and outsiders such as myself get to share our takes. Fair all around.

P.P.S. See here for more (reference from some comments below), where Florian Lange and Robert Kurzban write:

As researchers in the field of self-control, we read the recent publication by Bushman et al. (2014) with great interest. Using creative measures of aggressive tendencies, the authors examined the relationship between blood glucose levels and proxies for intimate partner violence. . . .

From their results, Bushman et al. (2014) concluded that glucose “influences aggressive tendencies and behaviors” (p. 3) within couples. They regarded their findings as implying that “interventions designed to provide individuals with metabolic energy might foster more harmonious couple interactions” (p. 3). While there is obvious appeal to the notion that glucose can increase self-control and thus prevent aggressive impulses from being expressed, this study does not provide evidence supporting this idea.

Exactly! Who knows? Their theory and proposed interventions might be correct, they might be wrong, they might be counterproductive, or, more generally, their recommendations might make sense in some settings and be counterproductive in others—but the published results do not provide good evidence.

Lange and Kurzban continue:

The work by Bushman et al. draws on the proposal that “self-control requires brain food in the form of glucose” (p. 3). However, the glucose model of self-control (Gailliot et al., 2007) suffers from both conceptual shortcomings and empirical falsification (Kurzban et al., 2013). Not only has the proposal that glucose fuels the part of the brain needed to exert self-control been shown to be inconsistent with what is known about brain metabolism (Kurzban, 2010), but the empirical evidence reported in support of the proposal has been demonstrated to be implausible from a statistical perspective (Schimmack, 2012). . . . This conclusion is further corroborated by replication studies that did not find the originally reported effect . . .

In view of these issues, self-control and blood glucose levels cannot simply be equated. As a consequence, when relating their outcome measure to blood sugar concentrations, Bushman et al. (2014) did not test, as they claim, “the effects of self-control on aggression” (p. 3). What they did test was the size of the relationship between daily fluctuations in blood glucose levels and a measure of aggressive impulse. Importantly, the authors did not record any self-control data and assuming that the number of pins stuck in a doll varies according to individuals’ ability to exert self-control is conceptually problematic. For the daily assessment of aggressive tendencies, participants were simply asked to indicate how angry they were with their partner. They were not required to inhibit or override their aggressive thoughts, emotions, or urges. Hence, the only conclusion licensed by the findings reported by Bushman et al. is that blood glucose relates to a single-item self-report measure of aggressive impulse, not to the ability to control these impulses.

We do not doubt that hungrier organisms are more aggressive. This accords with our everyday experience, the animal literature (e.g., Cook et al., 2000), and the Snickers ad campaign, “You’re Not You When You’re Hungry.” However, this observation does not imply that glucose reflects the fuel necessary to muster the willpower not to harm one’s partner.

For their second analysis, mean blood glucose levels across 3 weeks were related to aggressive behavior toward the partner. Analyzed in this way, glucose levels do not indicate the current state of a fluctuating self-control resource, but are rather a trait variable. This has important implications for the authors’ conclusions. The more aggressive participants on the laboratory task were not those who were ego-depleted or hungry in that particular moment. They had low blood sugar concentration in general, a trait that can be linked to aggression via numerous third variables. . . . Whereas the reported correlation might provide information about the biology of individual differences in aggression, it does not support the glucose model of self-control. . . .

33 thoughts on “This April Fools post is dead serious

  1. I was yards away from Comet Pizza Rest when the police apprehended the dude. All I could think: Why is it that I’m always in some proximity to some scandalous ridiculous event in progress. This has been a pattern. Must be my lavender/citron perfume.

  2. It’s easy to scoff at the voodoo doll task, but validation work from independent researchers suggests that it does indeed measure something related to aggressive behavior. See http://psycnet.apa.org/buy/2014-49432-001. Note that this research group has previously published failures to replicate, so I think they can be trusted to announce the bad news when the data don’t support the theory.

    Personally, I don’t doubt the empirical phenomenon observed by Bushman et al., but I’m less sure of the interpretation. People could be angrier or more aggressive when they are hungry for reasons that have little to do with a blood-glucose ego-depletion account of self-control over aggression. See Lange & Kurzban, 2014: https://www.frontiersin.org/articles/10.3389/fpsyg.2014.00572/full

    • Interesting validation study – certainly more thorough than I’m used to. But when your target measure produces zeros reliably, 80% of the time, I’d be suspicious about its utility.

      I checked if the negative binomial model used fits the data regarding all those zeros (perks of an open dataset), and it seems OK (at least for study 1). But those 80% are quite heterogeneous in terms of their scores in other measures. Given the inability of the Voodoo Doll to differentiate most respondents, I would rather use another scale as my predictor.

    • Neuroskeptic posted about this validation work for the Voodoo Doll task, a few years ago:
      http://blogs.discovermagazine.com/neuroskeptic/2016/02/05/stick-pins-in-voodoo-doll-of-child/#.WsrPky5ubtQ

      In the studies reported there, the voodoo dolls and pins were virtual… either non-existent (participants were asked to imagine that they were sticking pins in a soft-toy stand-in for their child), or pictorial (generic crime-scene outlines, with stickers for pins).

      The original paper with the VDT (DeWall et al.) alternated between virtual voodoo dolls (pictures on a computer screen) and actual soft toys. That paper specified 51 as the maximum number of pins to be imagined or actually inserted. I guess everyone subsequently has stuck to that number because one must follow the incantation precisely.

  3. Some meandering googling uncovered a pretty thorough critique of that study in Frontiers in Psycholology in June 2014 by Florian Lange and Robert Kurzban:
    https://doi.org/10.3389/fpsyg.2014.00572

    On the statistical perspective, they cite Ulrich Schimmack’s 2012 Psychological Methods paper “The ironic effect of significant results on the credibility of multiple-study articles”:
    http://psycnet.apa.org/doi/10.1037/a0029487
    To quote from its abstract:
    “The problem of low power in multiple-study articles is illustrated using Bem’s (2011) article on extrasensory perception and Gailliot et al.’s (2007) article on glucose and self-regulation.”

  4. Roy Baumeister is listed as editor on this paper.

    He is also a frequent collaborator of Brad Bushman’s, and was Nathan DeWall’s grad advisor. How is there not a conflict of interest here?

    • # “Roy Baumeister is listed as editor on this paper. He is also a frequent collaborator of Brad Bushman’s, and was Nathan DeWall’s grad advisor. How is there not a conflict of interest here?”

      I also noticed this, and did some subsequent searching. If i am not mistaken it looks like this paper was a “direct submission” after which PNAS “identifies an appropriate NAS member to serve as the editor. The Member Editor may choose reviewers, guide modifications and revisions to the text, and decide whether the paper should be recommended for publication” (http://www.pnas.org/page/authors/direct-submission)

      I still remain puzzled about how “peer-review” has been set up, and how much power journals, editors, and reviewers have. This example to me highlights many of the problems with tradtional “peer-review” (and editors/journals in general) and i reason a strong case can actually be made that journals and tradtional “peer-review” can be seen as anti-scientific and directly responsible for many of the problems with science (e.g. because there seems to me to be way too much power in the hands of a few people to decide whether something is a useful scientific contribution).

  5. There seems to be a consistent theme in Bushman’s varied research oeuvre, i.e. an assumption that our minds are constantly taking metaphors seriously. Specifically, the metaphors of English. So raising our blood-sugar level will literally sweeten our behaviour. “Strength of will” behaves just like physical strength. Induce people to clean their hands, and they also lose feelings of guilt and social obligation. This may also happen when you give them a slate and ask them to wipe it clean, but I don’t know if he’s tried that.

    This is Bushman’s take on ’embodied cognition’. It reminds me of David Lodge’s observation, in ‘Mensonge’, that Lacan proved that the Unconscious Mind speaks a language, and also that the language is French. In short, he assumes that subjects follow a process of magical thinking and mental voodoo. So it makes perfect sense to give them voodoo dolls to express their level of aggression.

    Now if you’ll excuse me, I have a roomful of subjects to supervise while they assemble lengths of bamboo and string into replicas of aircraft, as a test of their desire for consumer goods.

  6. and its interesting to note that this specific PNAS paper was edited by Roy Baumeister who, as far as I know, was the phd. advisor of at least one of the authors (DeWall).

    • Zbicyclist:

      What would be really cool is if, having lost the bout, Bushman had to hand over his Margaret Hall and Robert Randal Rinehart Chair of Mass Communication to Lange and Kurzban. They could split it: Lange could take the Margaret Hall Chair and Kurzban could take the Randal Rinehart Chair.

  7. I don’t get the joke in the set-point diet April Fools post, which makes me feel a lot dumber than when it’s just statistics going over my head.

    • Wonks:

      The joke was that I was always trying to convince Seth to do a controlled trial of his diet, and he was always saying he had no interest in doing so because he already knew it worked. So I just decided to make a study up from scratch!

  8. From what I can see, the study doesn’t consider the participants’ alcohol intake at all. That’s an odd omission, since heavy alcohol intake can reduce inhibition and glucose at once. The sheer decrease in inhibition could account for some of those voodoo pins.

Leave a Reply to zbicyclist Cancel reply

Your email address will not be published. Required fields are marked *