Skip to content

Death of the Party


Under the subject line, “Example of a classy response to someone pointing out an error,” Charles Jack​son writes:

In their recent book, Mazur and Stein describe the discovery of an error that one of them had made in a recent paper writing: “Happily, Bartosz Naskreki spotted this error . . .” See below for full context.

​That is from page 129 of Prime Numbers and the Riemann Hypothesis, Barry Mazur and William Stein.

See how easy that was?

Are you listening, himmicanes people? fat-arms-and-voting people? ovulation-and-clothing people? ovulation-and-voting people? air-rage people?
Satoshi? Daryl? John? Roy? Marc? David? Amy? Andy?
Bueller? Bueller?


    • Andrew says:


      Oh, if we get started on politicians we’ll never stop. In politics and business you’re supposed to lie and cheat where appropriate; that’s part of the game. Science and journalism are supposed to be a different story.

  1. Jordan Anaya says:

    My blog post detailing more errors is taking a while since there are a lot of errors to wade through, but I wanted to give you a small taste (pun intended again).

    Take a look at Table 2 from “Bad Popcorn in Big Buckets”:, the publication which seems to have started this whole container size movement thing.

    You can view the table here:

    An entire column is mislabeled!

    And no, I’m not talking about the fact that the “Freshness” label is missing, the “Container Size” label should be moved one column over to the left. The “Container Size” column is actually the “Freshness” effects. With how the table is presented they make it appear that the size of the container is more important than the freshness of the popcorn when it comes to taste and quality.

    Interestingly, if you go to this version of the paper: the table is correctly labeled.

    In addition some of the test statistics are wrong and the degrees seem to wrong.

    I don’t know what I’m seeing, but I can’t look away.

    • Carol says:

      Jordan Anaya: I was able to pick up a preprint version of the “Bad Popcorn in Big Buckets” article. I note that there are suppression effects in Table 3; these would not be errors.

      Note that in the regression of consumption on container size, freshness of popcorn, taste, and quality, the standardized regression coefficient for freshness is larger than it is in the regression of consumption on container size and freshness. Ordinarily, it would be smaller.

      Also note that in the regression of consumption on container size, freshness of popcorn, taste, and quality, the standardized regression coefficient for taste is larger than it is in the regression of consumption on taste and quality, and the standardized regression coefficient for quality has reversed sign.

      If you want to know more, get my e-mail address from Andrew or from Nick Brown.

    • Carol says:

      Jordan Anaya: Re Table 2, it looks as though the published version dropped a column that was in the preprint version, which could be due to a journal copyediting problem that the authors did not catch when they read the proofs.

    • Carol says:

      Jordan Anaya: Table 2: ANOVA F values of 101, 2012, and 194. Really?

      • Carol says:

        I meant 201, not 2012!

      • Jordan Anaya says:

        Yes, those values are very large, but they did give them either fresh or 14 days old popcorn, so observing an extremely high effect is understandable. Nick and I have noticed various problems with the regression models that they use, but I try to limit myself to values which are mathematically impossible

        In regards to these ANOVA values, 2 out of the 3 values are mathematically consistent with the means and SDs they provide.

  2. Martha (Smith) says:

    I think I’ve mentioned before on this blog the response I got when I pointed out to a well-know mathematician a mistake in a book he had written:

    “Oh, is my face red!”

  3. Carol says:

    Andrew: Who’s Bueller?

  4. Ben Prytherch says:

    Now let’s suppose that there was no logically deductive way of distinguishing prime from not prime numbers, and the most popular way of attempting this distinction was to calculate an easily manipulated probability that nearly everyone misinterprets, and then just for fun let’s say Mazur had gained popular notoriety from credulous journalists who found the implications of this number being prime to be exciting and newsworthy…

  5. Jordan Anaya says:

    Brian Wansink has updated his Addendum II, making this the third addendum to his post:

    “In the end, I think the biggest contribution of bringing this to attention (van der Zee, Anaya, and Brown 2017) will be in improving data collection, analysis and reporting procedures across many behavioral fields. With our Lab, a rapidly revolving set of researchers, lab and field studies, and ongoing analyses led us to be sloppier on the reporting of some studies (such as these) than we should have been. This past Thursday we met to start developing new standard operating procedures (SOPs) that tighten up field study data collection (e.g., registering on, analysis (e.g., saving analysis scripts), reporting (e.g., specifying hypo testing vs. exploration), and data sharing (e.g., writing consent forms less absolutely). When we finish these new SOPs (and test them and revise them), I hope to publish them (along with implementation tips) as an editorial in a journal so that they can also help other research groups. Again, in the end, the lessons learned here should raise us all to a higher level of efficiency, transparency, and cooperation.”

    I’m not sure what to think, but I do find it interesting that he’s convinced his lab will soon be proficient enough in these methods to write a publication about them!

    One possibility is that he met with some statisticians and truly had a come to Jesus moment and will now be a model scientist from here on out.

    But if this were the case he would realize most of the work he has published is likely wrong and he would warn people about trusting the results.

    My suspicion is he realized the errors we found in his pizza publications are not limited to those papers, but also occur throughout his publications (he notes being “sloppier on the reporting of some studies”) and he is hoping if he finally acknowledges there is a problem people won’t do any more snooping. It’s too late, I’ve already snooped.

    • Martha (Smith) says:

      In any event, this is a whole lot better than doing nothing — and hopefully will be a lesson to (at least some) other researchers to do better to begin with.

      • Andrew says:


        Wansink seems to be following a strategy of getting ahead of the criticism while minimizing any acknowledgment that (1) many of his published findings could be pure noise and (2) his research methods are pretty much guaranteed to come up with meaningless yet statistically significant patterns, over and over again. For him to frame this problem as “sloppy reporting” is not quite right. Had those four studies been reported in a non-sloppy way, they still would be presentations of noise.

        The interesting decision point will come a couple years in the future. If Wansink’s lab members continue to design noisy studies, but now move to preregister all their research hypotheses, I think the stream of easy publications will dry up. It will no longer be possible for an eager student to walk in the door and squeeze out four papers from a failed study. This will really change everything. I doubt Wansink quite realizes this yet—I expect that he sees preregistration etc. as a bit of red tape that he can follow; he doesn’t catch that this has the potential to destroy the workflow which he has found so successful over the years. My guess is that when this finally hits home, he’ll try to find some way to continue with the p-hacking, perhaps by writing the preregistration plans vaguely enough, with enough researcher degrees of freedom, that they still will be able to find statistical significance from any experiment with enough effort. We’ll see.

        In any case, I agree with you (Martha) that this is a whole lot better than doing nothing.

    • Carol says:

      I met Brian Wansink years ago, when we were both associated with UIUC. I was even a subject in an experiment that he ran at a mini-conference there. (If memory serves, I ate too many M&Ms.) My impression then was that he was a creative, energetic, outgoing, and likable person, but not very detail-oriented — not the sort who’d spend the evening checking the accuracy gf his stats, for instance.

    • Carol says:

      I note that Wansink has responded today (2/5/2017) on his website to van der Zee’s (2/1/2017) point-by-point reasons why the data should be anonymized and released. See the comments section.

  6. A bloke for Finland says:

    This reminds me of a case in Finland. Some researchers had got their interpretation of odds ratio wrong. This prompted Pertti Töttö and his friends to write about it in a journal, and they also analyzed some other incorrect interpretations of OR. Some of the critiqued ones were really apprehensive and tried to brush of the critique by just practically talking shite that had nothing to do with the actual problem, but I remember that at least one fella, Lauri Nummenmaa, just admitted that he got it wrong and fixed the mistake in his book. I remember him because I used his book to study classical statistics, and I think it is “the” book for many other Finnish people too.

    A random fact: a few years ago there was a murder case in Finland that got lots of attention. At some point people were saying that the guy who got murdered (and his wife who was widely thought to be the culprit) was somehow entangled in satanism, because people had seen him borrow a book with a pentagram on the cover from the library. What it actually was was Pertti Töttö’s book “The return of the devilish positivism” (2000), which examines such problems as (if I remember correctly) how to put differences in numbers in context and are measures such as statistical significance enough to determine what is practically significant.

    I couldn’t find all the relevant texts, but here is at least the initial critical paper:

  7. Jordan Anaya says:

    I finished my blog post:

    It’s a little long, I couldn’t help but wax philosophical a bit. I hope you don’t mind me quoting you.

  8. eric robinson says:

    I spotted something that I can’t get my head around and posted it to Brian’s blog. Maybe someone here can help too. Post below:


    I’m worried. This doesn’t add up –

    Brian says ‘a non-coauthor Stats Pro is redoing the analyses’,

    But Brian also said he couldn’t share the data because of his consent forms….

    “The records of this study will be kept private. In any sort of report we make public we will not include any information that will make it possible to identify you. Research records will be kept in a locked file; only the researchers will have access to the records.”

    If something along this line changes in the future, I will let you know.

    So this non-coauthor Stats Pro isn’t one of the original researchers…. presumably he/she must have access to the data to redo the analyses. So… why not share whatever you share with the non-coauthor stats pro with Tim et al. and whoever else wants it?

    Here is the solution. You can share the data.

    Alternatively, your actions with the stats pro have invalidated your consent agreement with the participants in this study..

    Which one is it? I’m confused about all of this.

    Eric Robinson
    University of Liverpool

    • Andrew says:


      It is possible that the researchers are not supposed to post the data publicly but can share it with individuals, one at a time, if they gain permission. Such rules exist. For example, I remember that to get access to state identifiers from the General Social Survey, we had to get individual permission and then the data could only be accessed in some sort of “clean room”; they would not give us a file.

      Beyond this, I can understand Wansink’s reluctance to share whatever data he has. Given the outrageous number of errors in his published papers, I’d guess that things would even more embarrassing for him if those tables were compared to the raw data.

      By sharing the data only with someone who he is contacting personally, Wansink can attempt to control the damage to his reputation by telling that statistician his side of the story and giving that statistician an incentive to match that story as closely as possible to the data. The mission, I assume, is for the statistician to eventually report that there were issues in record-keeping but once the data were analyzed correctly, none of the substantive conclusions changed, thus the papers of Wansink and his colleagues still stand. The model might be the statistician hired by Gilbert et al. in the notorious “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” episode.

      Given everything we’ve heard so far, I have little expectation that I would trust a report written by someone chosen and hired by Wansink, and I doubt this “Stats Pro” strategy will advance research. The very term “Stats Pro” is a bit of a joke and sounds more like public relations than science.

      If Wansink really did want to figure this out, he’d share lots of his data with some outside group—not chosen by, paid by, or associated with Wansink, or Cornell, or his close colleagues—and let them see what’s going on. But, again, at this point his incentives would seem to point him toward minimizing the release of any more information.

      From my perspective, the problem is that forking paths invalidates all the claims in those four papers, even if there hadn’t been 150+ errors. This was the “xkcd jelly bean” conversation that Wansink non-responded to on his blog. So even if the Stats Pro recalculates all the t-statistics etc and the significance levels don’t change much, the study is still dead on arrival.

      The only thing that could really work at this point, I think, would be a series of preregistered replications. But I really really really don’t think anyone should bother wasting their time on this, unless they truly have nothing better to do.

  9. BoSelecta says:

    Am I the only one struck by how perfect Wansink’s responses are? I mean, I find the whole situation infuriating and I can’t wrap my head around how and why stuff like this happens, but at the same time the wording of his responses is so perfect… It’s as if a perfect response-generating AI was behind them.

  10. Carol says:

    BoSelecta: Along that line, I wonder why there has been dead silence from Wansink’s co-authors? Perhaps one would not expect a response from a student visiting from Turkey. But two of the pizzagate articles were first-authored by David R. Just, who is a full professor at Cornell.

  11. Jordan Anaya says:

    We received a response to our email to the Office of Research Integrity and Assurance. Tim van der Zee has shared our email and their response on his blog:

    I also hope you guys got a chance to read my post:

    • Andrew says:


      Interesting story, and I can feel your frustration. But my response to Tim is the same as what I wrote earlier on this blog: I see no reason why Wansink should feel compelled to share whatever data he has from that experiment. I also see no reason why any of us should believe anything Wansink writes about it.

      These are data from a study which Wansink himself described as “flawed,” and the only research products from these studies are four hopelessly p-hacked studies which would be essentially useless (except, I suppose, for teaching purposes) even had it not been discovered that they had over 15 errors. There’s nothing there. Release of the data (or a discovery that there actually is no dataset) would, presumably, reveal even more errors.

      • Jordan Anaya says:

        We suspect that there is indeed a data set because there is a video showing them at the restaurant:

        The date of the video didn’t seem to correspond to the timeline provided by Wansink, so we actually called the restaurant, and they told us Cornell had made multiple visits. I think the lab also attempted to run even more studies at the restaurant–I guess they really like doing studies there for some reason.

        I agree with everything you are saying, but Tim, Nick, and our other confidantes think we should continue to do our due diligence. If anything, it doesn’t look good for them to deny sharing the data.

        Another reason we believe the data exists is that they were perfectly happy to share it with us until we mentioned we wanted to use it to confirm some errors we had found.

        It looks like New York Magazine is going to publish a long post about the story tomorrow, so keep an eye out for that.

        • Andrew says:


          It makes sense that if they’d done successful (by their standards) experiments at this restaurant before, that they’d want to do more. It can be hard to set up good working relationships, so once you have one, you might as well keep using it.

          Also, it’s possible that there’s no single dataset. There could be lots of data scrawled on all sorts of forms, maybe they weren’t entered correctly into the computer, who knows? It does seem to be a mystery how they could’ve had 150 errors. So I can understand your desire to get to the bottom of things. It’s a puzzle, kind of the opposite of the Michael Lacour story. Lacour had an entire dataset that was consistent with his published paper, but the data were faked. In contrast, Wansink and his colleagues seem to have actually did their experiment, but it’s mathematically impossible for there to be a dataset that’s consistent with what they published.

        • Jack says:

          It doesn’t look good for you to do what you’re doing. I read your post and it’s just silly, if those are the errors you found out, please, this is not worth anybody’s time… and you clearly show a personal agenda because of your story. There’s no way you are being professional about this, your silly post title speaks by itself.

          • Andrew says:


            The whole Wansink thing’s a waste of time all around, yet we’ve all collectively spent many many hours reading those papers, staring at the numbers, and writing about it. In that sense it makes none of us look good! As scientists we should be out there making new discoveries, right? Or at least developing tools to allow others to learn about the world or improve their lives. Or, failing that, we should be out in the world enjoying ourselves, taking our families to delicious meals at Taco Bell or saving up our money for that dream bullfight-centered vacation. But instead here I am responding to a blog comment! My only justification is that the careful study of individual examples of junk science can, we hope, help us better understand the larger issues of science research and science communication.

            So, yes, I agree that Jordan’s post is over-the-top in a bit of an embarrassing way, and that his post title is silly—but maybe the reason this is embarrassing to you and to me is that here we are spending time commenting on these threads.

            Also it’s my impression that Wansink has had influence on policy—his recommendations are widely publicized in the news media, and he had a government appointment a few years ago—so purely from the standpoint of the public good on food/nutrition research and policy, Jordan could be doing a service. Yes, his post is a bit personal and his title is silly, but that’s part of the whole package, as the personal interest is a big part of what motivated him to dig into this case in the first place.

            Hey—I just spent 10 minutes writing this comment. Thus demonstrating my point. In this case, though, don’t take it as a waste of time on my part but as a desperate attempt to avoid doing my real work.

        • Carol says:

          Jesse Singal’s article “A popular diet-science lab has been publishing really shoddy research” is up now on New York Magazine’s website:

        • Carol says:

          And now see this:

          for another person’s experience with the Wansink lab.

      • Nick says:

        Andrew, you wrote:

        >>I see no reason why Wansink should feel compelled to share whatever data he has from that experiment.
        >>I also see no reason why any of us should believe anything Wansink writes about it.
        That’s great at the level of the skeptical scientist (which is, by coincidence I’m sure, the title of Tim van der Zee’s blog). The problem is that 99.99% of people out there typically believe what (the media tells them that) “scientists have discovered” — as long, of course, as it fits into their personal prejudices and doesn’t seem to involve them having to question the merits of their last $1000+ purchase.

        This means that in practice, society has a problem with junk science, whatever lab it comes from and on whatever topic it focuses, because the 0.01% of people who read and question academic journal articles don’t get a look-in when it comes to national TV coverage.

        I think it’s correct to say that historically, monks and nuns got a free pass out of some of the things that society expected people to do because they were sacrificing a bunch of other stuff in order to live their monastic lives. Perhaps we need to consider ways to stop scientists double-dipping (i.e., mostly-government funding on the one hand, and personal fame and fortune via mass-market books and other products and services on the other). I am constantly amazed that psychologists working on Effect X can publish books entitled “Effect X: Ten Weird Tricks To Happiness” without having to declare their obvious massive disincentive to disprove Effect X (cf. Feynman) as a conflict of interest.

Leave a Reply