Pizzagate, or the curious incident of the researcher in response to people pointing out 150 errors in four of his papers

There are a bunch of things about this story that just don’t make a lot of sense to me.

For those who haven’t been following the blog recently, here’s the quick backstory: Brian Wansink is a Cornell University business school professor and self-described “world-renowned eating behavior expert for over 25 years.” It’s come out that four of his recent papers—all of them derived from a single experiment which Wansink himself described as a “failed study which had null results”—were hopelessly flawed. An outside research team (Tim van der Zee​, Jordan Anaya​, and Nicholas Brown) looked at the papers and found over 150 errors. Earlier, I’d looked at the papers and found that they sliced and diced their data in different ways to come up with statistical significance. The data were all from the same experiment but different analyses used different data-exclusion rules and adjusted for different variables.

All this led me to disagree with Wansink’s assertion that publishing that sort of work was a better use of one’s time than watching Game of Thrones.

Since then I’ve noticed a few weird things in this case:

1. Some people seem to be upset that Wansink isn’t sharing his data. If he doesn’t want to share the data, there’s no rule that he has to, right? It seems pretty simple to me: Wansink has no obligation whatsoever to share his data, and we have no obligation to believe anything in his papers. No data, no problem, right?

2. Wansink’s easygoing reactions seem to me to be dissociated from the seriousness of the problems that people have found with his work. A bunch of commenters on his blog have pointed out the obvious problems with his research methods, but he has just responded blandly in an in-one-ear-and-out-the-other kind of way.

Here’s a representative example. Anthony St. John writes:

With field studies, hypotheses usually don’t “come out” on the first data run. But instead of dropping the study, a person contributes more to science by figuring out when the hypo worked and when it didn’t.” [quoting Wansink]

I suggest you read this xkcd comic carefully: https://xkcd.com/882/

It provides a great example of learning from a “deep dive”. [quoting Wansink]

Brian Wansink replies:

Hi Anthony,

I like it. Thanks for the link. (Makes me grateful I’m more of a purple jelly bean guy).

Best,

Brian

Anyone who looks at that famous xkcd jelly-bean cartoon will immediately realize that it’s slamming the “deep dive and look for statistical significance” approach to research. But Wansink follows the link and . . . doesn’t get the point? Doesn’t realize that St. John, like most of the other commenters on the blog are saying he’s doing everything exactly wrong?

And there are lots more exchanges on that post that have the same flavor, people commenting that the work is “salami slicing null-results . . . worthless, p-hacked publications . . . junk science,” and Wansink giving mild, agreeable responses like, “I understand the good points you make.” Just a complete disconnect. The guy really does seem to be a living embodiment of that jelly bean cartoon.

3. But the weirdest thing of all is Wansink’s reaction to the three outside researchers finding 150 errors in his papers. Who has 150 errors in four papers? When does that ever happen?

Of course Wansink doesn’t want to share his data—that much is obvious. Zee et al. found those errors without even seeing the original data—these were just inconsistencies in the published tables. It’s hard to imagine what could’ve happened to get that many errors out of a single dataset, but whatever did occur must be a bit embarrassing to the people concerned.

What stuns me is Wansink’s attitude! When you publish four papers from a “failed study,” and the statistical methods in those papers are criticized by experts, and when an outside team finds 150 errors in the papers, the appropriate response is not to say you’re gonna go fix some little things and “correct some of these oversights.” No. The appropriate response is to consider that maybe, just maybe, the data in those papers don’t support your claims.

Let me put it this way. At some point, there must be some threshold where even Brian Wansink might think that a published paper of his might be in error—by which I mean wrong, really wrong, not science, data not providing evidence for the conclusions. What I want to know is, what is this threshold? We already know that it’s not enough to have 15 or 20 comments on Wansink’s own blog slamming him for using bad methods, and that it’s not enough when a careful outside research team finds 150 errors in the papers. So what would it take? 50 negative blog comments? An outside team finding 300 errors? What about 400? Would that be enough? If the outsiders had found 400 errors in Wansink’s papers, then would he think that maybe he’d made some serious errors.

The whole thing just baffles me. On one hand, Wansink seems so naive about statistics and research methods. But on the other hand, who could be so clueless as to not suspect a problem when hundreds of errors have been found in these papers? Most scientists I know would get concerned if someone found one error. Even from a purely strategic standpoint, if you’re only concerned about your reputation, wouldn’t it make sense to cut your losses and accept that these particular papers are hopeless messes?

And what can Wansink possibly mean when he writes, “We’ve always been pleased to be a group that’s accurate to the 3rd decimal point”? That makes no sense given the incredible density of errors on those four papers.

As I said, the whole thing just seems weird to me. I just can’t understand Wansink’s serene response. If you publish empirical work and someone finds 150 errors in your papers, that’s a concern, no?

To paraphrase the famous spiritualist Arthur Conan Doyle:

Detective: “Is there any other point to which you would wish to draw my attention?”

Blogger: “To the curious incident of the researcher in response to people pointing out 150 errors in four of his papers.”

Detective: “The researcher did almost nothing in response to people pointing out 150 errors in four of his papers.”

Blogger: “That was the curious incident.”

I perhaps thought of this because Wansink has been called “the Sherlock Holmes of food” by the American Psychological Association.

Who cares?

The question naturally arises, why keep writing about this dude? (I think this is my fourth post on the topic.) Busy prof runs a science factory, whips out zillions of papers each year, very little quality control, students and postdocs in the lab are under huge pressure to come up with publications, they use every trick they can think of to come up with statistically significant results, some people take a careful look and find inconsistencies and errors in some of the papers, this is awkward because prof has described the student who did the work as a “hero,” prof tries to affably sweep the whole event away, questions are raised by bloggers and journalists who’d never even heard of the prof until this controversy, etc.

Same old same old, we hear about in Retraction Watch every day. The only new thing about it is the 150 errors—when does that every happen?—but, still, maybe that’s not enough to make the incident worthy of four separate posts.

I continue writing about this story because of the insight it gives into the inner workings of the famous self-correcting nature of science. The process of self correction is much more involved than people seem to realize. Sometimes people demand retractions, but as I’ve written before, I don’t see retraction as a serious solution for reform of poor research and publication practices, or as a way of cleaning the public record. The numbers just don’t add up: there are just too many hopelessly flawed papers, and retraction is done so rarely.

It’s impossible to solve problems such as Wansink’s “deep data dives” (actually I assume these dives were done by his student based on the encouragement and incentives provided by Wansink) on a case-by-case basis. There are just too many cases.

Paradoxically, this motivates me to examine at certain individual cases, like this one, in detail, to look at how people at different stages of their careers react to the realization that they’ve been doing junk science. This can help the many thousands of researchers out there who aren’t personally and professionally invested in discredited work, and who want to use scientific methods to learn about the world.

Not many of us have published multiple papers based on a “failed study which had null results,” and not many of us have had our names attached to papers with 150 errors, but we’ve all had research setbacks, ideas that didn’t pan out, and the excitement of major discovery—followed by the realization that we did something stupid and didn’t make that discovery after all. How to handle these disappointments? That’s not something covered in the usual course on research methods.

In judo, before you learn the cool moves, you first have to learn how to fall. Maybe we should be training researchers, journalists, and public relations professionals the same way. First learn about Judith Miller and Thomas Friedman, and only when you get that lesson down do you get to learn about Woodward and Bernstein.

P.S. Lots of great comments here. I just want to point out this one from Mark Palko:

When this guy finds an effect, he by-god finds an effect:

http://www.cbsnews.com/news/slim-by-design-author-brian-wansink-gives-tips-on-avoiding-bad-food/

In a new book, “Slim by Design: Mindless Eating Solutions for Everyday Life,” food psychologist and director of the Cornell University Food and Brand Lab Brian Wansink says you don’t need willpower to shed the pounds but to change your surroundings instead.

“You have a messy kitchen, a cluttered desk, you end up eating 44 percent more snacks than if the same kitchen is clear,” Wansink said on “CBS This Morning.”

In fact, people who leave cereal boxes on the counter are more likely to be heavier.

“Mainly women,” he added. “About 21 pounds heavier than the neighbor next door that doesn’t have any cereal visible at all.”

Those findings are based off of observational studies that Wansink performed. He investigated 230 homes in Syracuse, New York, measured the women’s weight and took pictures of their kitchens.

“If you’re serving white rice on a white place, you don’t really see the difference, so you tend to put about 18 percent more on,” Wansink said. “If you put that on a darker plate or a colored plate, you automatically serve less and eat less.”

“We’ve analyzed lots of orders and restaurants. What we find is that if you sit near a window, you’re about 80 percent more likely to order salad; you sit in that dark corner booth, you’re about 80 percent more likely to order dessert,” Wansink said.

I’d say this is disgraceful and counter to all principles of quantitative science—except that this sort of ridiculous hype is standard operating procedure among celebrated and leading researchers in psychology and economics. So, although can blame Wansink for publishing papers with 150 errors and then not seeming to really catch that this might be a problem, I can hardly single him out for publishing and publicizing ludicrously high effect size estimates that, to a trained eye, are the obvious product of the statistical significance filter: Take small sample size, add noise, pick at the data until you find a pattern that fits your story and is more than 2 standard errors away from zero, then publish the paper and go on TV advertising your stunning results. 80 percent more likely, indeed. I believe that about as much as I believe that early childhood intervention increases people’s wages by 40% when they grow up. Or that there’s a province in China where the life expectancy would’ve been 96 in the absence of indoor coal heating.

All these things could be possible—ok, not the 96-year life expectancy, but all the others—but I have no reason to believe them, as they’re super-biased estimates. I don’t usually make a practice of scaling my estimates up by a factor of 10 or whatever, just for the hell of it, but that’s what these researchers are doing when they report a selection of raw and noisy estimates that happen to be at least 2 standard errors away from zero. Type M error is not just a slogan. It’s a way of life with much of the research community. And CBS News, NPR, etc. fall for it, every time.

115 thoughts on “Pizzagate, or the curious incident of the researcher in response to people pointing out 150 errors in four of his papers

  1. Great stuff, but, again, we need some material — simple, accessible, brief — for “training researchers, journalists, and public relations.” What article(s) would people recommend for use in a STAT 101 class? I can’t think of a good one, which is why I hope Andrew writes it.

  2. Andrew – I think you were friends with Seth Roberts. You might be interested in his interview with Brian Wansink, which captures a lot of what I find/found appealing about both of them – http://blog.sethroberts.net/2006/11/26/brian-wansink-on-research-design/

    I would also recommend Wansink’s book “Mindless Eating.”

    I am certainly less credulous about Wansink’s conclusions after reading the comments, but I think he has a rare and really important skill in coming up with interesting phenomena. I really wish he had a partner who could play a more judicial role in evaluating the research, although it sounds like that has been crowd-sourced.

    A similar situation occurs in the teaching literature, where you can find a host of recommendations that sound reasonable but haven’t been evaluated. Testing all of them in a controlled field experiment would never happen, and finding a proxy that’s complex enough to be realistic but controlled enough to be feasible is a challenge. But not an insurmountable one.

    I’ll be sad if the ideas put forth in Wansink’s book turn out to be wrong, but it would be a wonderful exercise for an undergrad methods course (or the STAT 101 class mentioned in the prior comment) to replicate those studies – either way they’d learn a lot. The phenomena are clever, accessible, and interesting, so it would be great for someone unconnected with him to replicate them. I believe that the supply of stale popcorn is unlikely to be depleted in our lifetimes, so they wouldn’t be resource-constrained, either.

    • I understand this sentiment. I do educational research myself, and often I find myself testing very straightforward hypotheses and in hindsight everybody will think they would have predicted the outcome. The thing is, we use the scientific method to separate obvious things that are wrong from obvious things that right. Many people think that ‘learning styles’ is a very intuitive idea (which it is) but the evidence is very firmly against it.

      Even if all the results are very sensible and maybe even all correct, it is absolutely paramount that we *know* that is correct. It’s about the evidence.

      Papers which contain little evidential value (or, you know, 50+ errors) should be called out *even if the results happen to be true*. We dont want to be true by accident, we have to employ a research methodology which is reliable and consistent. Having an absurd high amount of reporting inconsistencies does not constitute a reliable method, and should thus be called out.

    • Kevin:
      > has a rare and really important skill in coming up with interesting phenomena
      That could be, that is Wansink could be great at coming up with hypothesis or when made formal priors.

      The concern here, I believe, is that is methods for learning more about those from observations is expected (can be shown) to diminish and ruin those that are good. The chance they will luck out by chance and later be replicated by others is (I believe) way too low to be justified.

      This seems to arise in any criticism of how someone learns from observations: “Hey the might be very insightful and actually right”.
      True enough, but their design and analysis of observations does not help and perhaps even _suggests_ otherwise.

    • Kevin:

      Replication is great. But rather than having students replicate Wansink’s useless papers from the pizza restaurant, I think they should replicate some of his early papers that are more respected.

    • Pedagogy and graphs seem like the two areas where everyone has a strong position on how it *ought* to be done and very little hard evidence to back things up.

      The pop wisdom on graphs is even more ironic since a lot of “experts” come from the stat domain.

    • Curious:

      Thomas Friedman != Judith Miller but they both represent examples of journalistic failure. I’d have mentioned Janet Cooke or Stephen Glass, but it would be too easy for students to think of Cooke and Glass as cheats, pure and simple, without getting the larger point. With Friedman and Miller it seems clearer how good intentions can lead to the fall, and how, if you’re not prepared for failure, you can make things worse.

      That’s the message I want to send. If you’re doing research (or journalism, or medicine, or taxi driving, or welding, or teaching, or almost anything), failure is not just an option, it’s a necessity. Unless you quit early, failure will happen. Failure. Will. Happen. And researchers are not trained to deal with failure. So you get people like Wansink or Hauser or Fiske, who reach the dizzy heights of professional success, then fail, and don’t know how to handle it. Or you get people like the ovulation-and-clothing researchers, or the power-pose researchers, or the beauty-and-sex-ratio researcher, who early in their careers have big successes that turn into big failures, and, again, they have no plan, no template for what to do when it turns out you messed up.

      Our textbooks and lectures and inspirational stories are full of successes, with very little on failure, and even less on how to handle it.

      • Remember back in the day when failed scientists just destroyed themselves and not the whole profession? Those were the days….

        http://www.goodreads.com/book/show/1643643.The_Quest_Of_The_Absolute

        I guess the invention of small-N experiments that never get replicated means the Balthazar Claes of the world are now superstars instead of degenerate and misguided idealists destroying themselves in their endless pursuit of the Absolute. Because now they would find It, with p<0.05, and so they could stop fussing about with all the expensive experimenting and start right in on the lucrative fame enhancing.

        … ok wow, that comment comes off as unnecessarily cynical… too strong jrc. I really just wanted to point to the Balzac, because I think it is an interesting book to think about in relation to the idea of scientific failure and how that concept may have changed in the last 50 or 100 years. But I'll leave the comment, because it is also kinda funny.

        More relevant, I agree that students are not trained to understand the regularity of failure. They seem to think it is a moral and intellectual failure on their part if they don't get *** on their regression results. And that people who do get *** are somehow, by definition, "better" empiricists. You know that combination of feelings you get when a grad student says "But that student from [Other School] got it to work!"? And you have to tell them a) that *** does not mean it "worked"; b) that student probably p-hacked the hell out of that result, possibly with the full support of their adviser; c) they could cheat if they had a different adviser and it would probably pay off for them in terms of career advancement; d) actually finding real things in the world that are robust and meaningful and consistent is very hard and they will probably not find one that interests their profession in the two year window they have before they need to find a job.

        Ugh… maybe the lesson is just that I'm a terrible adviser to choose. The good news is that this advice will probably harm my career too, so at least I won't have to ruin too many grad students before they deny my tenure… and wow jrc, right back to too cynical. I'll just see myself back to my boring research on careful measurement and interpretation of non-surprising effects. You know, because that is my job, and I like my job.

  3. So, as of this moment, there are two articles with the same title and the same cat picture and 80% of the same content, back to back… Will there be 2 more to make it an obvious joke about getting 4 blog posts out of the same content, or is this a bug in WordPress?

  4. “If the outsiders had found 400 errors in Wansink’s papers, then would he think that maybe he’d made some serious errors.”

    I decided to take a closer look at his other work, and I’m finding the exact same types of errors in these other papers as well (impossible means/SDs, incorrect test statistics, incorrect degrees of freedom, etc.). Although it should be noted once in a while I do find a paper that doesn’t seem to contain any errors, and I say “doesn’t seem” because the papers never contain any raw data so it is impossible to say for sure.

    We may just get to 400 errors, and I’m curious to see where this story will go from there.

    P.S. This is your fifth post on this topic and I apologize if my further investigation results in a sixth, seventh, or more…

  5. One of his abstracts begins:

    “The appearance of being scientific can increase persuasiveness. Even trivial cues can create such an appearance of a scientific basis.”

    11/10 for self-awareness

  6. “There isn’t always a quantity and quality trade-off, but it’s just really important to make hay while the sun shines.”

    An amusing interpretation of this is that he fully understands all the problems AND realizes that the academic community is starting to as well. So he’s got to publish as many of these papers has he can before standards rise.

  7. I know a few people who are so sure of their thinking that they just don’t care about getting the numbers right. They collect data only for reassuring tehmselves and getting published. They think that correct Tables, p-values, etc, are just nuisances that distract from the pure facts as they perceive it. “Oh well, but the results would be more or less the same.”

  8. Perhaps the possibility needs to be confronted that in some areas the quality of a researcher’s work has little to do with a researcher’s professional success. In mathematics, physics, chemistry, biochemisty, computing – the so-called “hard” sciences and their underpinnings – one can still reasonably well trust that a full professor at a major US university has at least published some solid to excellent work at some point in his/her career. In engineering, softer biology (animal behavior, psychology, some neurobiology), economics, etc. one will find that this is no longer necessarily the case, although it is generally the case. As one moves into areas where there is more commercial activity, one will find a higher and higher density of low quality and wrong work (I don’t know where to put statistics and medicine). The current university dynamic makes it easy. Getting a lot of citations guarantees professional success, notwithstanding that things that are deep and profound rarely get a lot of cites in an absolute sense – because not many people are capable of understanding them – and lots of cites may just indicate that something is silly and sounds good to the poorly educated …

    A lot of these people actually think they are good researchers, and that’s probably the case with the guy being discussed. Nothing in their professional lives has told them otherwise. On the contrary, their silly ideas and insubstantial empirical investigations, sloppy data collection and incorrect data analysis have all lead to publications, citations, promotions, consulting contracts, book deals, etc. Apparently they are doing it right. If that’s coupled with some charisma and the ability to teach the sky’s the limit. The degree mill aspect of the educational system means that one can make it at least through the master level without understanding much of anything, and in some areas what passes for a doctorate is laughable.

    Another problem is the tyranny of mathematics. In many “softer” areas things are felt more credible if they are dressed up in mathematical/statistical language. The old style of just explaining a thing is insufficient, or seems silly to people who rely on the authority of what they don’t understand. One sees this very acutely in economics (and maybe statistics too) where many practically important foundational aspects are still poorly understood.. People would rather not think and invoke a metric or a decision rule preordained by someone else. It’s worse still when they don’t actually understand the metrics and decision rules, but feel obliged to use them. It seems like a game. A bit of fudging is ok in most games, particularly if everyone else does it. That line of thinking leads to all sorts of bad behavior, particularly when one runs a research group.

  9. I think the problem is fundamentally about how these researchers interpret their findings vs how the public & journalists interpret the findings.

    Researcher finds evidence supporting Claim Y. Concludes, this appears to be true, but others should check it out.
    Media and the public conclude, This is absolutely and without any doubt true! Change your life based on it!

    That said, there are researchers who think their finding Y is The Absolute Truth, and then give Ted talks encouraging people to change their lives based on this.

    • Jack:

      What you write sounds reasonable, but with low-quality research I don’t think it’s appropriate to even say that the paper is “evidence supporting Claim Y,” let alone that Claim Y “appears to be true.” With papers such as Wansink’s, or power pose, or fat-arms-and-voting, or beauty-and-sex-ratio, or himmicanes, . . ., in all these examples, I don’t think the papers are providing any relevant information at all!

      • Chris, Martha:

        Sure, I guess you’re probably right. I was just disturbed that people didn’t seem to recognize how wrong this behavior was. All this focus on whether Wansink should share his data—as I wrote above, I think it’s up to him, he can share the data if he wants—and not enough screaming at the idea that a guy could be confronted with 150 errors in his papers and barely bat an eye. As if this just happens every day to him, no big deal.

  10. This story will make a lot more sense to people who accept Wansink’s behaviors and attitudes are the norm in many areas of research. That is what you are taught is “science” at the highest levels of education. Attempts to deviate will meet with insane resistance. As a result, someone actually doing a good job is a very, very rare exception. And this is nothing new, Meehl wasn’t the first but he was complaining about it in the 1960s. The stacking of BS upon BS in the name of science has been going on for a long time.

    • This has been my conclusion as well. People such as Dan F above (http://statmodeling.stat.columbia.edu/2017/02/03/pizzagate-curious-incident-researcher-response-people-pointing-150-errors-four-papers-2/#comment-408835) give passes to “hard” sciences, but if you’re in those areas you see the same scat different animal.

      It’s time to stop thinking in binary (this area is ok, but that area isn’t), every area has this issue, the key is to quantify what fraction of the “output” is good+useful+correct vs what isn’t. Also, how easily can one identify the good+useful+correct stuff, and which stuff is it? What are the costs of getting the good stuff, and what are the benefits?

      My impression is that the signals used in the past (prestige of journal, institution that the author works at, number of citations, etc) are all played out. The crisis goes deep into the outfield, possibly out into the parking lot or beyond the parking lot into the corn-fields (for those who like baseball analogies).

      It’s fairly possible that there are whole areas of Psychology where nothing that could really be construed as a stylized fact that we didn’t know beforehand has ever been published (like for example, how much do we really know about the psychological effect of divorce on later-life behavior of the children? I just pulled that out of thin air, but it’s an interesting and important topic, and if it’s been dominated by p-value searching like power-pose has… we probably know very little). The scary thing is that when you look deeply at other areas like various portions of pharmacology or orthopedics or banking policy or the ecology of south american rainforests or whatever…. it might not be that different. My wife works on bone healing in a biology lab. She asked orthopedists how do they cope with bone fractures that don’t heal. They came back with essentially “PLEASE TELL ME WHAT TO DO, I JUST SHOVE RANDOM STUFF INTO THE GAP AND HOPE SOME BONE GROWS” every single surgeon she talked to was DESPERATE to be told what to do because it’s a debilitating issue and very little is known.

      Of course, there are WHOLE ACADEMIC SOCIETIES with tens of thousands of members each running labs or treating patients, all devoted to studying bone and there have been for decades.

      • Daniel,

        If as you suggest that the signals used in the past to identify quality research output is all played out, are you then suggesting that the scientific endeavour itself is bogus? That the entire foundation of science rests has weakened to the point that all published research is suspect (even those that may have passed peer review)?

      • I only know about stats, computer science, psychology, and linguistics firsthand, and things are definitely the worst in linguistics and next worse in psychology. But as Daniel Lakeland says, it’s a matter of degree. The only measure of what makes a paper good is that other professors think it’s good. Thus a clique of people reinforcing one another can last a long time and do a lot of lousy science as long as there’s an audience; in fact, the existence of such an audience is an excuse to hire a person in an area, so it’s all very self referential. It’s thus very easy to fall into Emperor’s New Clothes problems where everyone follows the leader (basically all of theoretical linguistics), flavor-of-the-week problems (e.g., I have to do deep learning if I’m going to get a job in machine learning), or the opposite kind of not-the-way-we-do-things problems (e.g., if there’s not a p-value, we’re not publishing it).

        The other problem is that while professors might have done cutting-edge work before they get tenure, they often then sit in their chairs for another 25 years enforcing whatever standards they used in grad school and used to get tenure. This is a huge drag on progress in stats, in my opinion, whereas machine learning doesn’t have the same degree of baggage and inertia (maybe it could use a little— the problem can swing the other way, too).

        After sitting on both sides of the academic fence for many years (trying to do research myself, while also being on editorial boards, grant panels, hiring committees, admissions committes, etc.), all I can say is that seeing how the sausage is made left me sick of science. Not science itself. But how it’s practiced. I got out and went to industry for 15 years. Now I’m back in academia, but I’m not playing the traditional tenure-track publication and fame game any more.

        It’s hard to see a way to play the game objectively, so it’s not like I have any great ideas to move forward. I’m a pragmatist in the philsophical sense, after all. And remember, “politics” is just the word for more than one person trying to make a collective decision.

        • My impression in Engineering was that we sort of solved some of the major problems in civil engineering back in the 1950’s and 1960, and we used some big factors of safety, and few things fell down. A lot of what’s gone on since then is wanking. Although it would be trivially easy to improve the way we design buildings for earthquakes and wind loads and soforth, it would ultimately require so much retooling of the knowledge base of the workforce that it will be resisted. So, academia has spent a lot of time solving “cool” but ultimately low-importance problems like active damping of floor vibrations or harvesting energy from the surf, or detecting cracks in bridges using vibration sensors, or doing inspections via autonomous remote drones or whatever. They’re not economically viable in most cases, they require advanced dynamical models and lots of fancy mathematics, they make for lots of great “holier than thou” papers, they let you pretend that you’re at the cutting edge of our abilities… but in the end they have little hope of improving many of the big civil issues we face such as here in CA we have a terrible water rights system that wastes enormous quantities of a very precious resource and the problem is almost entirely political and Engineers have no hope of solving it. We also had a terrible time with electricity generation and distribution back in the early 2000’s and again, ultimately should be a civil engineering issue and should be something we could help with, but ultimately it’s all politics and there’s no hope.

          I guess what I’m trying to say is that even in those “hard” fields where perhaps the correctness of the results is better, there is a tendency towards irrelevance that is also a major problem.

        • Bob:

          You write, “they often then sit in their chairs for another 25 years enforcing whatever standards they used in grad school and used to get tenure.” It’s been almost 25 years since I got tenure, so maybe in a few years I’ll be able to get out of that damn chair!

        • @Andrew: Yikes, me too. Then my department exploded before I ever got to sit down. I should also point out that the problem with tenure isn’t that the tenured profs never do any more work. Most of the tenured profs I know don’t know any other way to live or they would’ve been something other than profs to begin with.

          @Daniel: Agree about the relevance. But it’s hard to know what’s relevant and we need to report some kind of incremental progress. Science is very collective and not so much bolt-from-the-heavens as it’s portrayed in fiction and the minds of grad tudents. When academics whine about publish or perish, I never quite understand what they think their output should be. Now how that should be evaluated is another matter. But c’mon, at least show your work. And that leads to a vast scientific literature.

        • If you want to make a difference in the world as an applied science or engineering person, you have to work on problems that actually matter. Sure, I agree with you that it is fine even much preferred to publish a bunch of papers that say “we tried X and it had a kind of marginal effect, wasn’t all that good, don’t bother trying that again” but if “X” is “using an autonomous robot to pick up trash at construction sites” vs “X” is “taking X-Ray pictures of cast concrete joints in overpasses and trying to detect changes through time that might predict failure of the overpass” … it makes a big difference for society.

          However, in the first case, you can play around with computer vision and robotic coordination, and you can do it all in a fake construction site set up in the parking lot of your universities stadium, and you can do it all on a shoestring budget, and can publish all kinds of papers involving obscure mathematical stuff related to discriminating between trash and tools… and you don’t need to coordinate with the department of transportation for access to overpasses, etc etc.

          But, fundamentally, you’re working very hard spending millions in grant money over decades to put unskilled immigrants out of a job going around construction sites picking up trash at $11/hr for 15 minutes a day, a total of 0.0003% of the US construction budget or something.

          Or, there were people I knew who’s plan was to laser-scan construction sites daily producing a millimeter resolution point-cloud of every bump and ridge on every piece of rebar, gigabytes per day. And then what? Maybe if you’re lucky the best that would come out of it is decades from now an automatic quality control system for rebar ties or something. Yet, it’s a regular problem that concrete suppliers pour concrete that is too wet and a decade later it’s all cracked and deteriorated (I’ve seen it in investigation of parking garage decks often enough) Where’s the research on the $50 electronic probe that detects this before pouring the concrete?

          I’ll tell you where, it’s not being done, because it would require you to build a $50 prototype, and then mix up 50 small batches of concrete in your parking lot, and then run some regressions, and it’s not sexy enough, you can’t get tenure by publishing that kind of stuff.

        • The military put out some kind of request for proposals to build aircraft with built in sensors that would record the flight trajectories at microsecond sampling and then feed them into a dedicated supercomputer, one for each airframe, and simulate the loads on the airplane in detail to detect damage to the airframes. Of course this would involve writing all kinds of tremendous finite element codes that could model cracks and fatigue, laser scanning of the aircraft to produce the meshes, whatever…

          I suggested they just fly drones around and crash them into enemies, it’d be like 2 orders of magnitude cheaper. Unsurprisingly I didn’t get a grant from the DOD.

      • “People such as Dan F above give passes to “hard” sciences, but if you’re in those areas you see the same scat different animal.”

        I can only speak to mathematics of the fields Dan F listed, but I promise you, in math, we don’t see anything like this. We do sometimes have controversies of a very different nature, but basically never this scenario where respected and prestigious researchers’ work is revealed to be a house of cards. That would probably be impossible to get away with in mathematics, because *everything* is in the paper (there’s no experiment and no unseen data, just the statements and demonstrations in the paper), and for any paper of great significance, there are going to be people eager to understand every detail.

        • Mike:

          The closest thing I can think of in the math field is little ideas that get hyped in the media. For example there was “catastrophe theory” several decades ago, “fuzzy sets” sometime after that, and, several years later, that grandiose book by Stephen Wolfram. But . . . (a) these were just three episodes in fifty years; and (b) the underlying work was not wrong, it was just not quite up to the hype in each case.

  11. Speaking as a left-leaning person, I think liberals really need to start paying attention to stories like Wansink’s. It’s important for them to realize that there’s a difference between adhering to policies aligned with scientific consensuses and those aligned with academic consensuses. I’d argue that Wansink’s unwillingness to explicitly admit that his real-world ready studies have been fatally contested makes him only marginally more reliable than the right wing ideologues that left wingers love to lampoon. As other commenters have noted, the “softer” social sciences are filled with people like Wansink, so in the interest of finding common ground, liberals might want to be more careful about the reliability of the knowledge on which they’ve based their opinions.

    • Ben,

      I think this is a problem with liberals and conservatives both, although in different ways. You can see this with Wansink, in that I think his “mindful eating” paradigm (which, as I discussed in a previous post, might well be a good idea!) fits in with liberal ideas of empowerment of ordinary people, while his corporate consulting (which, again, might be very effective!) fits in with a conservative perspective. Anyway, the point is that there’s a market for easy answers among liberals and conservatives, in different ways.

      Similarly with Amy Cuddy of power pose fame. Power pose is a message of female empowerment and also a great topic for corporate seminars.

      Thomas Basbøll and I earlier discussed this juxtaposition regarding Karl Weick, the business school professor who became notorious for copying another’s work without attribution. On one hand, Weick was a sociologist who trafficked in that field’s politically liberal buzzwords. On the other hand, he took the story that he copied and transmuted it to a business-friendly message regarding the need for strong “leaders,” a story that worked well for his audience of Wall Street executives. Happy-talk B.S. can be popular on both the left and right.

      P.S. I do nonprofit work myself and also corporate consulting, so I’m not slamming Wansink or Cuddy or Weick for doing any of these things!

    • I suggest the following edits.

      so in the interest of finding common ground, liberals everyone might want to be more careful about the reliability of the knowledge on which they’ve based their opinions.

      Bob

  12. Andrew said, “In judo, before you learn the cool moves, you first have to learn how to fall. Maybe we should be training researchers, journalists, and public relations professionals the same way.”

    Yes! But I’m not convinced that “First learn about Judith Miller and Thomas Friedman, and only when you get that lesson down do you get to learn about Woodward and Bernstein” or otherwise learning about people is the way to go. What is needed is teaching that involves lots of critiquing (especially by other students), with the teacher providing guidance (e.g., criticize the work or the action, not the person; no name calling; etc.) so students learn to give and accept criticism as a normal part of learning and working.

    PS Learning to fall in one context can carry over to another. I’ve never done Judo, but did do Aikido for a few years (many years ago), where you also start by learning to fall. Came in handy a couple of years ago when I was digging out a shrub in my yard and all of a sudden the tough root I was hacking at snapped, and I started to fall. I hadn’t done Aikido in decades, but the learning had become instinctive, so I automatically did a nice Aikido roll-back-then-foreward, ending up sitting cross-legged on the ground with only a little bruising on one hip.

    • Martha:

      Interesting points. This topic’s worth a thread of its own.

      To respond to your second paragraph:

      Hmmm, thinking about it, yes, learning in school involves lots of failure, getting stuck on homeworks, getting the wrong answer on tests, or (in grad school) having your advisor gently tone down some of your wild research ideas. Or, in journalism school, I assume that students get lots of practice in calling people and getting hung up on.

      So, yes, students get the experience of failure over and over. But the message we send, I think, is that once you’re a professional it’s just a series of successes.

      Consider Wansink’s horrible but inadvertently instructive blog post that got this whole thing going (and I bet he wishes he never wrote that!). His message was, pretty clearly, that you can have success if you work hard enough, but you won’t have success if you won’t try, if you’re not willing to pay the price (which in this case seems to have included intensive p-hacking and whatever data manipulations it took to create 150 errors). It was very much in the modern gospel of success.

      Although we give our students a lot of experience of failure as students, I don’t think we train them to fail as researchers. When they fail, it feels like . . . failure. Like something they shouldn’t do, something they should be embarrassed by, and maybe hide.

      So, I do think we need to balance the success stories with the failure stories, and I do think it’s good to talk about people. So that when students grow up in the world and have their “Judith Miller” moments, they’ll have the sense to cut their losses and move on. I guess maybe Thomas Friedman isn’t such a good example because he’s still a “success” in the Gladwell/NPR/NYT/Ted sense of the word. Miller’s a better example because she’s widely viewed as a failure, not just in journalistic terms, but also in her career. So she’s more of a cautionary example of someone who didn’t know how to fall.

      Now, I do understand how it’s hard for Wansink or Fiske or Cuddy or Bargh or any of those others to admit their failures, as most or all of their careers and reputations are on the line. But that’s part of the point, that they would’ve been better off admitting the small failures earlier. When Fiske et al. had that paper with p=0.052 or whatever it was, instead of using creative rounding to get to p=0.048, they could’ve taken a deep breath and considered the possibility that they were wrong, the possibility that the stats weren’t just a bit of “red tape” to overcome, but a useful warning. (For an automotive analogy: the warning light on their dashboard started flashing but they were in a hurry so, instead of stopping and checking to see the problem, they unscrewed the warning light and sped on.) They had no plan for how to deal with failure.

      • >”the stats weren’t just a bit of “red tape” to overcome”

        Whether or not p <0.05 is just red tape though. If they had kept collecting data it was bound to happen anyway.

      • “they would’ve been better off admitting the small failures earlier”

        It would have been easier to admit the small failure earlier, but I don’t think they would have been better off. Not admitting to the error gave them their ‘success’ by which I mean professional (tenure, editorial positions etc.) and material privilege. I think this shows that the problem is an incentive problem as well as an attitude problem. There are many unscrupulous people who would willingly publicise these sorts of unsupported or even completely false claims for that reward. So even if you teach people to keep calm and carry on when they ‘fail’, unless they are also disincentivised to do so, there are still those that will follow in the path of Cuddy, Wansink et al. That’s why I think this criticism is valuable on the whole, even if the individual cases exposed are individually somewhat insignificant. We need to ensure that this path doesn’t lead to success by adding as much negative utility to it as we can. If that means being methodological terrorists, well, so be it.

        • Kit:

          Yes, good point. I think another factor is that Wansink, Cuddy, etc., have worked very hard to get to where they are. All the hard work in the world doesn’t make this into good science—there must be hardworking astrologers too, after all—but I’m guessing that this effort has given them a sense of entitlement, that they’ve earned everything they’ve gotten, and they don’t want anyone to take it away from them.

  13. [At least Wansink didn’t accuse his critics of dissuading scientists from pursuing innovative research.]

    http://gizmodo.com/five-major-cancer-studies-are-proving-difficult-to-repr-1791379638

    Scientists involved with the original papers have begun reacting to the replications’ outcomes. “Our original research has been reproduced in at least 13 peer reviewed articles by independent research groups,” Erkki Ruoslahti of the Sanford Burnham Prebys Medical Research Institute in California, behind this replicated study, told Gizmodo in an email. Ruoslahti’s study found that a certain molecule could increase the efficacy of certain cancer drugs and help shrink tumors, but the replicated study didn’t notice a statistically significant effect on tumor weights.

    “We are aware of two additional groups in the US that are in the process of publishing results that confirm and extend [our] results,” Ruoslahti continued. “Science is self-correcting—if a finding can’t be repeated it will vanish-and that hasn’t happened to our technology. Our current focus is on moving this promising technology forward to the clinic.” Rousalhti hopes efforts like that of the Center for Open Science won’t dissuade scientists from “pursuing innovative research that has the potential to benefit patients.”

    • Thanks, this is the project where they had to drop ~25% before starting because no one could figure out what the methods were? In that case, there should be about 30 more of these forthcoming, which should be interesting.

      It is too bad they are still using “significant or not” as the criteria for whether a finding was reproduced, that really makes it difficult to tell what happened here without looking closer at the individual papers.

  14. Are you quoted accurately here: http://www.slate.com/articles/health_and_science/science/2017/02/stop_getting_diet_advice_from_the_news.html

    “Slate contributor Andrew Gelman, a statistician at Columbia University, was one of the first to raise alarm bells after encountering Wansink’s post—he blogged about it back in December. He has since followed up with a nuanced assessment on the likelihood that the results, even if they maintain statistical significance, indicate something true.”

    • Jordan:

      No, this does not capture what I told the reporter. What I said, or wanted to say, was that the purported statistical significance is irrelevant, that these papers seem to contain no useful empirical information—but this does not mean that Wansink’s substantive theories are wrong. Wansink might well have a deep qualitative understanding of eating behavior, so that his substantive claims might be mostly correct and valuable. That is, he could be contributing usefully to the world even if the quantitative work in these papers is useless.

      Wansink and his colleagues have so many degrees of freedom in these studies that they can use them to prove anything they want. As science, that makes these papers useless. However, suppose that, based on qualitative observations over the years, Wansink has a very good sense of what works in the aggregate. Then, yes, he can bend the data to show whatever he wants the data to show—but this could be a good thing, if he’s using this to support things that happen to be true. Bad science can be used to support true theories.

      What I said to the reporter was that I had no idea, but that the above scenario was possible.

  15. I posted a comment this morning but it has not appeared. Perhaps it was lost when one of Andrew’s two nearly identical posts was dropped. I will try again now.

    To those who doubt Brian Wansink’s claims that the dataset cannot be released because of agreements with the restaurant and Cornell’s IRB:

    I won’t make any guesses as to whether Wansink’s claims are true. But I have worked with many different datasets over the years and sometimes datasets *are* confidential, proprietary, or restricted in some way, even if the identifying information has been removed. This is more often the case in medicine, business, and some government situations than it is in psychology or similar disciplines. I once worked with a dataset that had a very strict legal confidentiality agreement many pages long, with serious monetary penalties if we broke that agreement. It was a rich and interesting dataset, and I had many requests from colleagues to share it or parts of it. I always explained politely that I could not do so, even showing the legal agreement as proof. Some colleagues put me through the ringer on this anyway; more than one suggested that I share the data “under the table,” so to speak. I never did so, however. It was an unpleasant experience.

    Reading the point-by-point refutation of Wansink’s claims posted on Wansink’s website and duplicated on PubPeer by one of the graduate-student authors of the “pizzagate” critique, who surely cannot have much experience in these matters, brought back this unpleasant experience. If Wansink agreed not to release the data way back when, he may be legally bound not to do so even now (even if the data are “anonymized”) without the permission of the restaurant or the appropriate offices at Cornell.

    • Carol:

      As I wrote on my post (see point 1), I don’t think Wansink has any obligation to share his data. Likewise we have no reason to believe that a dataset even exists. If I’d asked Wansink to share data and he said no, I’d just give up.

      • Andrew:
        That is my reaction as well, however my coauthors like to do their due diligence (how responsible of them) and have contacted Cornell.

        I know we aren’t getting the data. If the data set does actually exist I suspect we would find every single number in the papers is wrong.

        Not sharing the data set is just a small part of this story, and is consistent with everything else:
        -I’m not p-hacking, I’m deep data diving
        -There may be some minor (150) errors, but the results won’t change
        -The data is “tremendously proprietary”, we can’t possibly let anyone look at it, it contains diner names!

        I should note that when we first contacted the lab requesting the data we didn’t mention the errors, and they did offer to have us enter into an agreement with them to get and use the data, but once we mentioned we wanted to check some errors we never heard back from them again.

        Is that how science is supposed to be done? Respond to emails when you think you might get another publication out of it, but ignore anything critical of your work?

        This article was a very interesting read: http://www.motherjones.com/environment/2015/03/brian-wansink-cornell-junk-food-health

        It describes some of his most influential work. I’ve found errors in all of those studies.

        • I see from the new Slate article that one of the authors (Nick Brown) believes all these errors will not change the results.

        • Jordan:

          1. No, what Wansink is doing is not how I think science is supposed to be done. Even if the positive scenario I’ve suggested is completely true—so that Wansink’s savvy qualitative insights are correct, and all the junk science is in the service of a positive, useful message—even then, I think it would be better for him to ditch the small-N quantitative studies and instead just be more explicit about the sources of his qualitative insight.

          2. Here’s my favorite quote from the article you link to:

          Wansink gushed so much about his favorite brands—McDonald’s, Taco Bell, Coke—that it triggered my conflict-of-interest detector. But he’s no shill. He just genuinely believes that corporations can be the most powerful instruments of change, and well-meaning too. “More so than a lot of family restaurants,” he told me in all earnestness, “McDonald’s wants to do the right thing.”

          Pretty cool how the reporter could figure out what Wansink genuinely believes! I guess that’s what makes someone a real class-A journalist: the power to read minds.

        • Jordan:

          I grew up in the suburbs, 2 blocks from McDonald’s. Every once in awhile (maybe every couple weeks or so, I can’t remember now), we’d go there for dinner or one of us would walk over and get a bunch of burgers and fries and bring it back home for all of us to eat. It was such a treat. We were upper-middle-class, not poor, but back then it was a big deal to go to any restaurant, and we never ever went anywhere expensive. Of course, now that I’m financially comfortable I never go to McDonald’s cos there are other places where the food is much more delicious.

          I’m surprised that Wansink with all his riches goes to McDonald’s to eat. I guess tastes differ. Or maybe it’s ideology on his part. Or it could make professional sense, for him to have an eating lifestyle similar to that of the people he studies.

        • Gee, for me it was the part in the previous paragraph about taking money from MacDonald’s that set off my conflict-of-interest detector.

          (and if Mother Jones is buying this…)

  16. I have never seen “Game of Thrones” but maybe it is time to defend Wansink. Unlike the usual suspects taken to task on this blog, Wansink’s crime is rather benign. If nothing else, he has succeeded in uniting the disparate points of contention customarily seen here. In contrast to the statistical crimes of Big Pharma or the current regime in Washington, no one’s health or well-being is likely to suffer. A mistake here, a mistake there, indeed, a mistake almost everywhere, yet he remains congenial and he doesn’t respond to criticism by sending out defensive tweets that end in “SAD!!”

    • Paul:

      Think again. I bet this won’t make you happy.

      P.S. I watched 5 straight episodes of Game of Thrones (I think it was Season 5) on a flight to Europe when for child-care reasons I was unable to go to sleep. They were pretty good so on the return flight I watched the other 5 episodes. It was kinda trashy but still excellent, like a good entertaining movie or BD. But I never watched the other seasons. Now I wish I’d started with Season 1. On the other hand, probably one reason why I enjoyed Season 5 so much was that I had to figure out everything that was going on; decoding all the references midstream was half the fun.

  17. Thanks. This post is so incredibly funny. It’s hilarious. I will not mind more papers with > hundreds errors, if you keep writing posts like these.

    • Marcel:

      Thanks. Paradoxically, although it’s hilarious, I’m entirely sincere. This is Jay Leno / Stewart Lee style. You say what you actually believe, and it’s funny cos it’s true.

  18. Next blog post:

    *Youtube link*
    blurry video. Recorders moves lens outwards, facing them. It’s Andrew Gelman. “Hey guys, shh, I’m standing in the Cornell library. Brian comes in here after class.”
    *Door opens*
    Lens faces Brian. Gelman’s voice from the background: “Hey, dweeb, your science is TERRIBLE. Just admit it, admit your science is terrible.”
    “w… what? w who are you? leave me alone.”
    “You know what we do to scientists who won’t admit their methodology is flawed?”
    *Gelman moves forward, grabbing Brian’s arm*.
    “l-leave me alone”
    *Gelman begins to make Brian hit himself*
    “Stop hitting yourself nerd. Stop hitting yourself.”

  19. When this guy finds an effect, he by-god finds an effect:

    http://www.cbsnews.com/news/slim-by-design-author-brian-wansink-gives-tips-on-avoiding-bad-food/

    In a new book, “Slim by Design: Mindless Eating Solutions for Everyday Life,” food psychologist and director of the Cornell University Food and Brand Lab Brian Wansink says you don’t need willpower to shed the pounds but to change your surroundings instead.

    “You have a messy kitchen, a cluttered desk, you end up eating 44 percent more snacks than if the same kitchen is clear,” Wansink said on “CBS This Morning.”

    In fact, people who leave cereal boxes on the counter are more likely to be heavier.

    “Mainly women,” he added. “About 21 pounds heavier than the neighbor next door that doesn’t have any cereal visible at all.”

    Those findings are based off of observational studies that Wansink performed. He investigated 230 homes in Syracuse, New York, measured the women’s weight and took pictures of their kitchens.

    “If you’re serving white rice on a white place, you don’t really see the difference, so you tend to put about 18 percent more on,” Wansink said. “If you put that on a darker plate or a colored plate, you automatically serve less and eat less.”

    “We’ve analyzed lots of orders and restaurants. What we find is that if you sit near a window, you’re about 80 percent more likely to order salad; you sit in that dark corner booth, you’re about 80 percent more likely to order dessert,” Wansink said.

    • Mark:

      Ugh, I hadn’t noticed that. 44%, huh? This guy should team up with Carol Dweck and Paul Gertler.

      I guess the positive spin on this is that, suppose the effect is not 44% but 4%. 4% still isn’t nothing. Of course 4% is undetectable from this guy’s experiments. But, again, if he has some savvy intuition, then he’s just running experiments to get compelling stories to go with his messages. So we can (and should) ignore all the numbers.

      CBS News, though, they have no excuse. They’re stone cold suckers. They might as well work at NPR or write press releases for the Association for Psychological Science, that’s how credulous they are.

  20. Another point that is worth noting though I wouldn’t make too much of it…

    Wansink often reaches conclusions that support his longstanding conservative/libertarian beliefs.

    http://www.motherjones.com/environment/2015/03/brian-wansink-cornell-junk-food-health
    As a teen, he became fascinated with the life of Herbert Hoover, especially his work on improving food access for Americans. “He kept people from starving,” Wansink says. “I said to myself, ‘If I can ever do a fraction of what he did for food aid, I’ll be the luckiest guy in the world.'”

    A lifelong libertarian, he also opposes soda taxes and laws that require fast-food restaurants to post nutritional information. He considers such tactics elitist, and he hates nothing more than elitism. You might think of him as the anti-Alice Waters. When I told him I was hoping to go to the Moosewood Restaurant, Ithaca’s renowned temple of vegetarian hippie food, he winced. “The waiters and waitresses there seem really snooty,” he said. “And it is so expensive.” He prefers Taco Bell. “Where else can you feed a family of five for under $10?”

    • Mark:

      Yes, although you could imagine the same message being given with a politically liberal spin.

      Why this rich guy wants to take his family to dinner for under $10, though, that baffles me. Is it so bad to spread your wealth to the cooks, waiters, and dishwashers of the world?

      • Andrew,

        I was somewhat reluctant to bring this point up, partly for the reasons you mentioned and partly because “he’s just saying that because he’s a liberal/conservative/whatever” arguments should be avoided, but given the terrible quality of his work and the enormous magnitude of the effects he claims to find, it is worth noting that almost everything he says either helps sell his books, lines up with his political ideology, and/or serves the interests of companies paying him money.

        (And as the $10 dinner for five, few people consider one Taco Bell taco a meal. Order enough food and drinks to keep the family happy and you’re looking at $20 to $30. You could do it for ten at Little Caesar’s, but I guess they aren’t a client.)

        • Mark:

          I googled *wansink taco bell* and found this article where Wansink says his 3 kids eat 5 tacos between them. It’s not really taking the family to dinner if Mom and Dad don’t eat anything at all!

          Also this: “Wansink has also served as the executive director of the U.S.D.A.’s Center for Nutrition Policy and Promotion. (In addition, he is a member of McDonald’s Global Advisory Council, which makes nutritional recommendations to the company’s leadership.)” He plays both sides of the street.

          I also came across this blog from 2007, it looks like, entitled “Bringing a wet noodle to a gun fight: The USDA’s Brian Wansink vs. Big Food’s ads,” which as the title suggests presents Wansink as a foe of Big Food.

          I wonder if Wansink’s attitude has changed over the years as he’s received more corporate funding? I guess one could say that he has a consistent view: he strongly recommends that people eat out at fast food places, but then he recommends they go and order very small amounts of food. But that’s not what the fast food restaurants want, so there seems to be an inherent conflict in his recommendations.

        • Clearly his advice is not working:

          “According to the most recent data released September 2015, rates of obesity now exceed 35 percent in three states (Arkansas, West Virginia and Mississippi), 22 states have rates above 30 percent, 45 states are above 25 percent, and every state is above 20 percent.”

          Is it because people are not following his advice, or because they are?

        • It would be cool if we could run an experiment: two worlds, one with and one without Wansink, and run the experiment a thousand times to see the frequentist properties of the Wansink intervention.

  21. What a wonderful set of comments. My two cents worth is that I am sure that all you ever really have is your character in life and work. Hence, dishonesty, intentional or otherwise, invites a response to demonstrate that character. Maybe in graduate classes one should be doing seminars in ethics, as I did in my clinical training. These teach humility, as it rapidly becomes clear that the answer one thought was so clear has unseen ambiguities. Cultivate the attitude of separation of one’s science from one’s ego, by being aware and responsive to both through critique (not criticism), ensuring one engages in professional supervision/accountability regularly and building a culture whereby one’s professional advancement is less reliant on publication quantity. Like I inferred before, I would far rather be one of a thousand authors on the discovery of gravitational waves paper than have a thousand papers where I appeared to be ego-driven.

    • Llewelyn,

      I dunno, ego can be a powerful motivator, even if not for you. Feynman had a large and obnoxious ego, but he also figured out one or two important things when he wasn’t going around telling stories about how great he was. On the other hand, it’s all relative. My guess is that Feynman felt pretty puny compared to Dirac, and so he (Feynman) might’ve felt that it was ok to boast because deep down he was humble about his abilities.

      • Fair point — I was more suggesting keeping the two apart than that egoism should be reduced — probably somewhat unrealistic as you point out. I’m an idealist… didn’t know that about Feynman…

  22. How accurate are the “time stamps” for this blog? I note, for example,

    Mark Palko says:
    February 4, 2017 at 5:48 am

    Mark Palko says:
    February 3, 2017 at 11:01 pm

    Mark Palko says:
    February 3, 2017 at 3:31 pm

    Andrew says:
    February 4, 2017 at 8:43 am

    Andrew says:
    February 3, 2017 at 11:15 pm

    Andrew says:
    February 3, 2017 at 9:40 pm

    With this sort of data, Wansink has ammunition for an entirely new set of investigations such as the (celery) eating/sleeping habits of statisticians.

  23. he’s beeing polite, why does that baffle you? he’s acknowledging that people’s concerns are valid and he seems to understand the need to change some of his practices.

    it’s worth saying that you’re also overselling the importance of the statistics they do in some of these kinds of research. seriously, for a lot of these experimental/behavioral stuff, statistics is a mere formality because of the flawed incentives of the publishing system, nobody cares about it, they are just playing the NHST game. but the substantive experiments do reveal interesting things if you simply ignore the stats mago jambo they are forced to produce in order to publish.

    even if they did it “right”, the stats in most of these cases would be meaningless.

    • Jack:

      1. Polite is fine, but in that comment thread, Wansink seems to be stepping over the line from politeness into obliviousness. When people describes his work as “worthless, p-hacked publications . . . junk science,” and he gives mild, agreeable responses like, “I understand the good points you make,” that’s just non-responsive.

      2. You write, “even if they did it “right”, the stats in most of these cases would be meaningless.” Indeed! The point is that random numbers can “reveal interesting things” too. It’s not clear why anyone would want to read such papers. As I wrote in one of my posts, Wansink may well have valid qualitative insights, but in that case he should publish them as such and not muddy the waters by drawing lessons from tarot cards, tea leaves, or random numbers.

  24. In the Slate article there is this: “Wansink also confessed to mistakenly using the same data set for four different papers without declaration”

    I find this hard to believe. I have been doing experiments for 15 or so years, and it just cannot happen that one starts writing a paper and mistakenly uses the same data-set for four different papers by mistake.

  25. more from him:

    ‘In the end, I think the biggest contribution of bringing this to attention (van der Zee, Anaya, and Brown 2017) will be in improving data collection, analysis and reporting procedures across many behavioral fields. With our Lab, a rapidly revolving set of researchers, lab and field studies, and ongoing analyses led us to be sloppier on the reporting of some studies (such as these) than we should have been. This past Thursday we met to start developing new standard operating procedures (SOPs) that tighten up field study data collection (e.g., registering on trials.gov), analysis (e.g., saving analysis scripts), reporting (e.g., specifying hypo testing vs. exploration), and data sharing (e.g., writing consent forms less absolutely). When we finish these new SOPs (and test them and revise them), I hope to publish them (along with implementation tips) as an editorial in a journal so that they can also help other research groups. Again, in the end, the lessons learned here should raise us all to a higher level of efficiency, transparency, and cooperation’

    i see this as positive. this is a positive as a reaction as i have seen, compare it to Cudy or Gilbert.

    • Shravan: I agree, and had just sent the link to Andrew via e-mail. Wansink’s response is much better than the responses from other persons whose research has been criticized. I hope Tim van der Zee, Jordan Anaya, and Nick Brown will give him a chance now to “make it right.”

      • Until Wansink specifies a date for when he will share the data and explain how over 150 errors somehow sneaked into four of his publications, I wouldn’t hold your breath.

        There is also the matter of 7 other papers that contain errors, and the fact that his p-hacking makes any results which are actually mathematically possible uninterpretable. The Cornell Food and Brand Lab continues to promote all of their work, an article about Wansink’s work just appeared in the Washington Post, even though we don’t have any reason to believe any results from that lab can be trusted, have any chance of being replicated, or even matter in the first place.

        I don’t personally have any more blog posts planned, but my colleagues do. I think you will be interested to see what we are finding.

        • I’m not at all convinced that specifying “a date for when he will share the data” and explaining “how over 150 errors somehow sneaked into four of his publications” are the best things to focus on. I’d say wait and see if what he provides is indeed a worthwhile contribution to learning from mistakes, and bear in mind that it’s not something that can be expected quickly — doing it right will necessarily take time and may need to be done in “installments”.

        • I’m giving him the benefit of doubt. I see no reason so far to doubt his sincerity. Let’s be a bit magnanimous here.

          Sure, say, three months down the line we see no progress I’ll change my opinion.

      • Rahul, Jordan, Martha:

        I think there are two issues here which are somewhat in conflict:

        1. Wansink’s latest reply is indeed nearly the best that one could imagine. If he is planning to clean up his act, this is an excellent start; his attitude is commendably non-defensive; and he is not lashing out at all. I think it’s great to see someone to respond to criticism in this way.

        2. Given the contents of Jordan’s new post, it seems clear that Wansink’s entire workflow has major flaws. It’s not just a problem with those four papers discussed earlier, or a problem with that one student, or that one postdoc from a few years ago who replied so emptily to that other critic. The published work shows a stunning level of sloppiness.

        Put items 1 and 2 together and this suggests further difficulties, in that Wansink could end up retracting dozens or even hundreds of published papers. It’s hard for me to imagine that happening either, so I’m not sure where things will go from here.

    • Did anyone think that maybe this Wansink guy is in a position where he’s forced to write papers to exist

      They always are “forced”. It is justified as “doing what you need to survive,” rather than “method of protest” though.

  26. Unfortunately, I am reviewing a paper (psychology MBCT non-randomised trial) that has 48+ tests of significance to provide evidence of ‘mechanisms of action’ in SAD symptom reduction. When I apply the False Discovery Rate approach (Curran-Everett, 2000) I find that most of all hypotheses relating to ‘mechanisms of action’ were falsely rejected. I’m about to report this in the hope that they take it on board and consider providing that information to the readers, given that their study’s results are exploratory. I’d rather provide these authors with an option to go with, as opposed to: “Your study design was flawed, I have no suggestion to present your results so that the reader understands the ‘impact’ of this limitation”. Still thinking it through if someone has a suggestion please message.

Leave a Reply to Rahul Cancel reply

Your email address will not be published. Required fields are marked *