Three different people pointed me to this post, in which food researcher and business school professor Brian Wansink advises Ph.D. students to “never say no”: When a research idea comes up, check it out, put some time into it and you might get some success.
I like that advice and I agree with it. Or, at least, this approached worked for me when I was a student and it continues to work for me now, and my favorite students are those who follow this approach. That said, there could be some selection bias here, that the students who say Yes to new projects are the ones who are more likely to be able to make use of such opportunities. Maybe the students who say No would just end up getting distracted and making no progress, were they to follow this advice. I’m not sure. As an advisor myself, I recommend saying Yes to everything, but in part I’m using this advice to take advantage of the selection process, in that students who don’t like this advice might decide not to work with me.
Wansink’s post is dated 21 Nov but it’s only today, 15 Dec, that three people told me about it, so it must just have hit social media in some way.
The controversial and share-worthy aspect of the post is not the advice for students to be open to new research projects, but rather some of the specifics. Here’s Wansink:
A PhD student from a Turkish university called to interview to be a visiting scholar for 6 months. . . . When she arrived, I gave her a data set of a self-funded, failed study which had null results (it was a one month study in an all-you-can-eat Italian restaurant buffet where we had charged some people ½ as much as others). I said, “This cost us a lot of time and our own money to collect. There’s got to be something here we can salvage because it’s a cool (rich & unique) data set.” I had three ideas for potential Plan B, C, & D directions (since Plan A had failed). . . .
Every day she came back with puzzling new results, and every day we would scratch our heads, ask “Why,” and come up with another way to reanalyze the data with yet another set of plausible hypotheses. Eventually we started discovering solutions that help up regardless of how we pressure-tested them. . . .
There seems to be some selection bias here, as Wansink shares four papers from this study—these must be the results of Plans B, C, D, and E, or something like that—but we never hear about failed Plan A.
That’s important, right? Sure, the published results might be fine, but when Plan A fails—and, remember, this was an idea from a “world-renowned eating behavior expert”—that’s news, no? I’d think there’d be room for just one more paper, and at least one more press release and media appearance, about the idea that didn’t work.
OK, so what was happening at that all-you-can-eat buffet?
I googled to one of the listed articles, “Lower Buffet Prices Lead to Less Taste Satisfaction,” by David Just, Özge Sığırcı, and Brian Wansink. From the abstract:
Diners at an AYCE restaurant were either charged $4 or $8 for an Italian lunch buffet. Their taste evaluation of each piece of pizza consumed was taken along with other measures of behavior and self-perceptions. . . . Diners who paid $4 for their buffet rated their initial piece of pizza as less tasty, less satisfactory and less enjoyable. A downward trend was exhibited for each of these measures with each additional piece (P = 0.02). Those who paid $8 did not experience the same decrement in taste, satisfaction and enjoyment.
This should not be confused with the paper, “Peak-end pizza: prices delay evaluations of quality,” which reports:
For the diners who paid $4 for their buffet, overall taste, satisfaction and enjoyment evaluation depend on the taste of the last piece of the pizza and the peak taste consistent with prior findings. For those paying $8 for the buffet, the first piece of pizza is more important in predicting the overall taste, satisfaction and enjoyment ratings.
Or the unforgettable “Low prices and high regret: how pricing influences regret at all-you-can-eat buffets,” from which we learn:
139 total individuals who came to the restaurant alone (n = 8), in groups of two (n = 52) and in groups of three or four (n = 43) and five and over (n = 30) are participated to the study. Out of participants who ate at least one piece of pizza and were included to our analysis (n = 95), 49 of them were male and 46 of them were female, the mean age was 44.11, the mean height was 67.58 in., and the mean weight was 181.61 lb. The results were analyzed using a 2×3 between groups ANOVA. Diners who paid $4 for their buffet rated themselves as physically more uncomfortable and had eaten more than they should have compared to the diners who paid $8 for the buffet (p < 0.05). However, diners who paid $4 for their buffet gave higher ratings to overeating, feelings of guilt and physical discomfort than the diners who paid $8 for the buffet, even if they ate the exact same number of pieces.
That “n = 95” looked odd to me, because the first paper above reported:
Of the 139 participants (72 groups), 6 people who were younger than 18 years old were eliminated. Eleven other participants did not complete the relevant questions on the survey. Thus, usable and complete data were collected from 122 people.
So I don’t know why this other study only included 95 people. Maybe that’s what it took to get p less than 0.05. In any case, it’s good to know that “the mean height was 67.58 in.”
I don’t know how many people were included in the analysis for “Low prices and high regret: how pricing influences regret at all-you-can-eat buffets.” I’m sure this information is in the published article, but it’s paywalled:
I couldn’t bring myself to pay $16 just for the privilege of reading this paper. I guess if they’d charged $32 I’d value it more, ha ha ha.
I googled the title to see if I could find an un-paywalled preprint, but I found no full paper: all that turned up was the abstract and about a zillion press releases, including a twitter post by Brian Wansink. I followed the link, and this guy tweets every day. He has 1687 tweets! That’s fine, I’m hardly one to criticize given that I’ve published 7000 posts on this blog, but it is kinda funny coming from someone who wrote, “Yet most of us will never remember what we read or posted on Twitter or Facebook yesterday.” I have a feeling that a lot more people will read Wansink’s blog post than will ever read “Peak-end Pizza: Prices Delay Evaluations of Quality” all the way to the end.
One hundred and thirty three adults (74 males and 59 females) were recruited to participate in a study of eating at an Italian restaurant in Northeastern USA where customers paid a fixed price for “all you can eat” pizza, salad, and side dishes. Our analyses are based on a sample of 105 respondents because we discarded responses from eight recruits who were eating alone, and 20 recruits provided incomplete survey responses.
OK, the 133 adults are the 139 participants minus the 6 kids. So far, so good. And I can see that for the purposes of this study they removed the solo eaters, although this concerns me a bit—they compared same-sex to mixed-sex groups, but they also could’ve thrown the singles into the comparison groups, and also they studied both sexes so it’s kind of iffy that this article is only about men. All these papers are full of the “difference between significant and non-significant” thing. But then they also excluded 20 people who “provided incomplete survey responses.” The last time they did this, they only excluded 11 people! I guess it depends on which questions they study, what gets excluded. But then this raises some concerns about all the “digging through the data.”
Here’s how Wansink concludes his post:
Facebook, Twitter, Game of Thrones, Starbucks, spinning class . . . time management is tough when there’s so many other shiny alternatives that are more inviting than writing the background section or doing the analyses for a paper.
Yet most of us will never remember what we read or posted on Twitter or Facebook yesterday. In the meantime, this Turkish woman’s resume will always have the five papers below.
I have two objections to this attitude.
First, enjoyment is a worthy goal in itself, no? In all seriousness, I think that watching a season’s worth of episodes of Game of Thrones is more valuable than writing a paper such as “Eating Heavily: Men Eat More in the Company of Women.” After all, I read that psychologists have found that it is experiences, not possessions, that make people happy. So why not recommend that your grad students spend more time going to bullfights?
Second, I’m bothered by that last sentence that the resume “will always have the five papers.” The end state of research is not the resume. Nor is it the tenured job, the press release, the Ted talk, or the appearances on Oprah and Dr. Oz. Just ask Roy Baumeister or John Bargh.
I really don’t like the message that Wansink is sending to his students, that a paper on your resume lasts forever. It lasts forever if it’s a real finding, or if it leads to progress. But it doesn’t last forever if it can’t replicate (except in the indirect way that, certain papers on ESP, sex ratio, himmicanes, power pose, cold fusion, etc., will last forever as warnings of scientific overconfidence).
To his credit, Wansink has a comment section on his blog. And most of the comments are pretty harsh; for example:
You pushing an unpaid PhD-student into salami slicing null-results into 5 p-hacked papers and you shame a paid postdoc for saying ‘no’ to doing the same.
Because more worthless, p-hacked publications = obviously better….? The quantity of publications is the key indicator of an academic’s value to you?
I sincerely hope this is satire because otherwise it is disturbing.
This is a great piece that perfectly sums up the perverse incentives that create bad science. I’d eat my hat if any of those findings could be reproduced in preregistered replication studies. The quality of the literature takes another hit, but at least your lab got 5 papers out.
What you describe Brian does sound like p-hacking and HARKing. The problem is that you probably would not have done all these sub-group analyses and deep data dives if you original hypothesis had p < .05. . . . it is a bit difficult to end on a positive note. I have always been a big fan of your research and reading this blog post was like a major punch in the gut.
But I strongly disagreed with this comment:
If a hypothesis is sound, you should be able to predict the result of an experiment. Predict as in beforehand.
That sounds good, but in most of my applied work I learn so much from the data analysis and I can almost never predict beforehand what I’ll find. It’s important to get good data, though, and I have doubts about the quality of the data in that all-you-can-eat-restaurant experiment. To me, the key problem here is any theory is weak to nonexistent, and there are so many different ways that you can look at this small dataset. I’m not surprised that with intense effort they were able to find many different statistically significant comparisons—who knows, maybe a few more papers are forthcoming on the speed at which different people ate their salads, the positioning of men, women, and children at the restaurant tables, the relationship between how much they ate and how far away their cars were parked, etc. The possibilities are endless.
P.S. I just wasted an hour writing this. Ugh. I wish I’d watched an episode of Game of Thrones instead. My CV is ephemeral; the High Sparrow is eternal.
P.P.S. Wansink added an addendum to the beginning of his post. My take on the addendum is that Wansink is an open person who read all the comments but, unfortunately, doesn’t seem to understand the key statistical or methodological point, which is that he and his student could well have been sifting through noise, and that there’s no real reason to believe most of the claims published in those papers. But openness is a good start; I’m hoping that Wansink and others like him will continue reading the relevant literature and at some point will realize that failure is an acceptable option with noisy, poorly-motivated studies.
Here’s what Wansink wrote in his addendum:
With field studies, hypotheses usually don’t “come out” on the first data run. But instead of dropping the study, a person contributes more to science by figuring out when the hypo worked and when it didn’t. This is Plan B. Perhaps your hypo worked during lunches but not dinners, or with small groups but not large groups. You don’t change your hypothesis, but you figure out where it worked and where it didn’t. Cool data contains cool discoveries. If a pilot study didn’t precede the field study, a lab study can follow — either we do it or someone else does.
The problem is, is that thees “deep data dives” can often tell you nothing more than meaningless patterns, idiosyncratic to your data and hand. The statement “cool data contains cool discoveries” can be flat-out wrong. It doesn’t matter how “cool” your data are: if the noise is much higher than the signal, forget about it.
Brian Nosek learned this himself when he and his colleagues tried, and failed, to replicate their “50 shades of gray” study.
Brian Wansink refuses to let failure be an option. If he has cool data, he keeps going at it until he finds something, then he publishes, publishes, publishes. Brian Nosek recognizes that his research can fail. He realizes that when Plan A fails, the best Plan B may be to simply write the paper explaining that Plan A failed, to accept the limitations of his data. I hope that, someday, Brian Wansink learns this lesson too.
P.P.P.S. I did some more searching and found someone pointing out this from the “Low prices and high regrets” article:
“OS” is Ozge Sigirci, the Turkish graduate student mentioned above. But then something’s really wrong here. Wansink clearly stated in his blog post that the study had been designed and the data collected before Sigirci arrived. So how could it possibly be that she collected the data? I also continue to wonder about the original “failed study which had null results” from which all the rest of this flowed. How can you possibly think you’re making research progress if you publish your noise-mined successes but keep your failures hidden? Sure, I understand the bias: I’ve had a lot of failed ideas myself and I don’t usually get around to writing them up and publishing them; it takes work which always seems could be better spent elsewhere. But if you’re going to publish four separate papers on different aspects of a “failed study which had null results,” wouldn’t it be a good idea to also make clear what were your original hypotheses that weren’t borne out? Cos it might be that these hypotheses are actually fine, and that your data was just too noisy for them to show up in your sample.
P.P.P.P.S. See here for a similar reaction from Ana Todorović.