Skip to content
 

(People are missing the point on Wansink, so) what’s the lesson we should be drawing from this story?

People pointed me to various recent news articles on the retirement from the Cornell University business school of eating-behavior researcher and retraction king Brian Wansink.

I particularly liked this article by David Randall—not because he quoted me, but because he crisply laid out the key issues:

The irreproducibility crisis cost Brian Wansink his job. Over a 25-year career, Mr. Wansink developed an international reputation as an expert on eating behavior. He was the main popularizer of the notion that large portions lead inevitably to overeating. But Mr. Wansink resigned last week . . . after an investigative faculty committee found he had committed a litany of academic breaches: “misreporting of research data, problematic statistical techniques, failure to properly document and preserve research results” and more. . . .

Mr. Wansink’s fall from grace began with a 2016 blog post . . . [which] prompted a small group of skeptics to take a hard look at Mr. Wansink’s past scholarship. Their analysis, published in January 2017, turned up an astonishing variety and quantity of errors in his statistical procedures and data. . . .

A generation of Mr. Wansink’s journal editors and fellow scientists failed to notice anything wrong with his research—a powerful indictment of the current system of academic peer review, in which only subject-matter experts are invited to comment on a paper before publication. . . .

P-hacking, cherry-picking data and other arbitrary techniques have sadly become standard practices for scientists seeking publishable results. Many scientists do these things inadvertently [emphasis added], not realizing that the way they work is likely to lead to irreplicable results. Let something good come from Mr. Wansink’s downfall.

But some other reports missed the point, in a way that I’ve discussed before: they’re focusing on “p-hacking” and bad behavior rather than the larger problem of researchers expecting routine discovery.

Consider, for example, this news article by Brett Dahlberg:

The fall of a prominent food and marketing researcher may be a cautionary tale for scientists who are tempted to manipulate data and chase headlines.

I mean, sure, manipulating data is bad. But (a) there’s nothing wrong with chasing headlines, if you think you have an important message to share, and (b) I think the big problem is not overt “manipulation” but, rather, researchers fooling themselves.

Indeed, I fear that a focus on misdeeds will have two negative consequences: first, when people point out flawed statistical practices being done by honest and well-meaning scientists, critics might associate this with “p-hacking” and “cheating,” unfairly implying that these errors are the result of bad intent; and, conversely, various researchers who are using poor scientific practices but are not cheating will think that, just because they’re not falsifying data or “p-hacking,” that their research is just fine.

So, for both these reasons, I like the framing of the garden of forking paths: Why multiple comparisons, or multiple potential comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time.

Let me now continue with the news article under discussion, where Dahlberg writes:

The gold standard of scientific studies is to make a single hypothesis, gather data to test it, and analyze the results to see if it holds up. By Wansink’s own admission in the blog post, that’s not what happened in his lab.

Nononononono. Sometimes it’s fine to make a single hypothesis and test it. But work in a field such as eating behavior is inherently speculative. We’re not talking about general relativity or other theories of physics that make precise predictions that can be tested; this is inherently a field where the theories are fuzzy and we can learn from data. This is not a criticism of behavioral research; it’s just the way it is.

Indeed, I’d say that one problem with various flawed statistical methods associated with hypothesis testing, is the mistaken identification of vague hypotheses such as “people eat more when they’re served in large bowls” with specific statistical models of particular experiments.

Please please please please please let’s move beyond the middle-school science-lab story of the precise hypothesis and accept that, except in rare instances, a single study in social science will not be definitive. The gold standard is not a test of a single hypothesis; the gold standard is a clearly defined and executed study with high-quality data that leads to an increase in our understanding.

Dahlberg continues:

To understand p-hacking, you need to understand p-values. P-values are how researchers measure the likelihood that a result in an experiment did not happen due to random chance. They’re the odds, for example, that your new diet is what caused you to lose weight, as opposed to natural background fluctuations in myriad bodily functions.

Look. If you don’t understand p-values, that’s fine. Lots of people don’t understand p-values. I think p-values are pretty much a waste of time. But for chrissake, if you don’t know what a p-value is, don’t try to explain it!

And then:

P-hacking is when researchers play with the data, often using complex statistical models, to arrive at results that look like they’re not random.

Again, my problem here is with the implication of intentionality. Also, what’s with the “often using complex statistical models”? Most of the examples of unreplicable research I’ve seen have used simple statistical comparisons and tests, nothing complex at all. Maybe the occasional regression model. Wansink was mostly t-tests and chi-squared tests, right?

The article continues, quoting a university administrator as saying:

We believe that the overwhelming majority of scientists are committed to rigorous and transparent work of the highest caliber.

Hmmmm. I think two things are being conflated here: procedural characteristics (rigor and transparency) and quality of research (work of the highest caliber). Unfortunately, honesty and transparency are not enough. And, again, I think there’s a problem when scientific errors are framed as moral errors. Sure, Wansink’s work had tons of problems, including a nearly complete lack of rigor and transparency. But lots of people do rigorous (in the sense of being controlled experiments) and transparent studies, but still are doing low-quality research because they have noisy data and bad theories.

It’s fine to encourage good behavior and slam bad behavior—but let’s remember that lots of bad work is being done by good people.

Anyway, my point here is not to bang on the author of this news article, who I’m sure is doing his best. I’m writing this post because I keep seeing this moralistic framing of the replication crisis which I think is unhelpful.

Whassup?

People just love a story with good guys and bad guys, I guess.

Thinking of Wansink in particular: At some point, he must either have realized he’s doing something wrong, or else he’s worked really hard to avoid confronting his errors. He’s made lots and lots of claims where he has no data, and he continues to drastically minimize the problems that have been found with his work.

But Wansink’s an unusual case. Lots of people out there are trying their best but still are doing junk science. And even Wansink might feel that his sloppiness and minimization of errors are in the cause of a greater good of improving public health.

Fundamentally I do think the problem is lack of understanding. Yes, cheating occurs, but the cheating is justified by the lack of understanding. People are taught that they are doing studies with 80% power (see here and here), so they think they should be routinely achieving success, and they do what it takes–and what they’ve seen other people do–to get that success.

Now, don’t get me wrong, I’m frustrated as hell when researcher hype their claims, dodge criticism, and even attack their critics—but I think this is all coming from these researchers living in a statistical fantasy world.

I was corresponding with Matthew Poes about this, and he responded:

I agree worth you that the action is not typically intentional, or not done with ill-intent. I started to take issue with this around 2010 when I took over the analysts team at a lab at University of Illinois. It had become trendy to practically torture this dataset known as the Illinois Youth Survey (IYS) for some findings. Everything was correlated with everything and then grand theory-less models were run. While I had plenty to complain about here, my biggest fight was over the building of theory from what i saw as spurious correlations. Bizarre findings likely driven by a handful of cases or even possible random chance was causing the chaining of characteristics into policy to reduce the drug problem among youth. I’ve discovered you as a result of that. I made a hallmark of my current research model that we expect model developers (of interventions) to start with measurable fundamental active ingredients. That the outcomes they choose to impact can clearly and logically be linked to the active ingredients through a sensible theory of change.

My most recent ranting issue has been the effectiveness trial like QED’s making use of propensity score matching, worse yet using them to further derive “unbiased” subgroup estimates. I don’t actually like that method much to begin with, but it’s being terribly abused right now. I don’t know your feelings on the matter. I’m currently trying to work with ACF to develop some standards around the use of statistical control group balancing methods as well as consider pulling together a committee to even decide if they have merit. There are other matching methods I think hold promise so I’m not totally against them, but at a minimum people need to use them appropriately or risk junk science.

As David Randall wrote, let something good come from Mr. Wansink’s downfall. Let’s hope that Wansink too can use his energy and creativity in a way that can benefit society. And, sure, it’s good for researchers to know that if you publish papers where the numbers don’t show up and you can’t produce your data, that eventually your career may suffer for it. But what I’d really like is for researchers, and decision makers, to recognize some fundamental difficulties of science, to realize that statistics is not just a bunch of paperwork, that following the conventional steps of research but without sensible theory or good measurement is a recipe for disaster.

With clear enough thinking, the replication crisis never needed to have happened, because people would’ve realized that so many of these studies would be hopeless. But in the world as it is, we’ve learned a lot from failed replication attempts and careful examinations of particular research projects. So let that be the lesson, that even if you are an honest and well-meaning researcher who’s never “p-hacked,” you can fool yourself, along with thousands of colleagues, news reporters, and funders, into believing things that aren’t so.

24 Comments

  1. Ben Hanowell says:

    What’s wrong with propensity score matching in quasi-experimental design?

    • Jim Vine says:

      Gary King and Richard Nielsen have a manuscript titled “Why Propensity Scores Should Not Be Used for Matching”, which I found to be quite persuasive.

      Link:

      https://gking.harvard.edu/files/gking/files/psnot.pdf

      Abstract:

      “We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal — thus increasing imbalance, inefficiency, model dependence, and bias. PSM supposedly makes it easier to find matches by projecting a large number of covariates to a scalar propensity score and applying a single model to produce an unbiased estimate. However, in observational analysis the data generation process is rarely known and so users typically try many models before choosing one to present. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have many other productive uses.”

      Also, this one by William Shadish that concludes “The use of propensity score analysis has proliferated exponentially, especially in the last decade, but careful attention to its assumptions seems to be very rare in practice. Researchers and policymakers who rely on these extensive propensity score applications may be using evidence of largely unknown validity.”

      https://link.springer.com/article/10.1007/s11292-012-9166-8

    • Matthew poes says:

      The problem is that it is possible, even likely that you will derive a biased control group. Done wrong and it has been show to make things worse. From a more theoretical standpoint, for such a method to even work, you must have the right balancing data and unfortunately that is quite rare. Instead we are often forced to make due. Those balancing variables are critical. The quality of the measures, completeness of data, etc all matter. For more casual exploratory studies trying to help shape future research, maybe this is an acceptable first step. Especially when all we have is such observational data. However it is vertically no replacement for experimental designs with true control groups.

      My main issue with it is that there are better balancing approaches. For example Gary King’s Work around Coarsened exact matching or the like.

      Further, I get nervous when studies based on PSM are used to drive policy decisions. Done wrong and we’ve introduced more bias than existed had it never been done, yet nobody is the wiser.

      In some of my own work around this, I had run simulation models comparing the bias introduced to a data set when PSM was used as compared to baseline covariate. There was less bias in the baseline covariate approach. In my case I knew this because the raw data I started with came from an actual RCT. I used that to fake a observational study since that was what I had to do next.

  2. Nick says:

    I’ve been torn about the focus on p-hacking ever since Stephanie Lee’s excellent piece that revealed the extent of how much of it was going on. On the one hand it turned up the heat several notches, and tied the Wansink story into the ongoing replication crisis (as opposed to “this one guy is doing totally weird stuff, his basic numbers don’t add up”), but on the other hand it risked distracting from the other problems in his work, many of which are (hopefully) not as widespread or as well-documented in other labs as p-hacking is. It was a relief to me that Cornell found that there were multiple grounds to find that Wansink had committed academic misconduct. (But it was nice that the rampant p-hacking, which is presumably what is meant by “problematic statistical techniques” in Cornell’s public statement, was included. Up to now, p-hacking has been treated a bit like a professor having a relationship with an undergraduate was in the 1970s: Most people sorta vaguely knew that it wasn’t right, but you would have been surprised back then if anyone was formally called out for it.)

  3. Ian Fellows says:

    Thank you for your continued focus that much of the problem is not that mustache twirling scientists are faking data left and right, it is that well intentioned exploration leads to them fooling themselves.

    I think this is very good news! There is very little the scientific community can do against credibly generated fake data, but we can change the behavior of honest researchers because, fundamentally, they want to get these things right.

  4. anon_007 says:

    Andrew, I am a researcher and I need to publish papers. You say lots of things about p-hacking, replication, association is not causation, small sample and big effect sizes, but guess what, it is useless because it does not help me to publish papers, much to the contrary. So, I have to keep myself as far as possible from your blog, from statisticians; I have to pretend I know stats enough and not worry about it.

    Here is an incorrect statement of yours:

    “And, sure, it’s good for researchers to know that if you publish papers where the numbers don’t show up and you can’t produce your data, that eventually your career may suffer for it.”

    No, it may not. Actually, as a new researcher, if I dont want my career to suffer I have to find publishable results and not worry much about the quality of the data or the analysis. Why should I worry about stats, it will only make my life more difficult. The likelihood that people will check my data and catch some issues with it is much smaller than the probability that a new publication in my CV will open the doors to a better job, or keep me in my current job. And IF anything ever comes up, it will not be the end of the world, we will cross that bridge IF and when we get to it – lets not get stuck because of unlikely IFs.

    Lets not be blind, everybody is talking about the problems with science and yet everybody behave the way I behave, because everybody need to publish. One considers one’s moral values only when one’s belly if full and one’s rent is paid off. It is not like few people do this. When people start learning more stats and realize the crap they are doing they say “whoa, wrong turn, I dont need to learn statisical inference or causality, I just need to know how to run the analysis, that is all…”. So that is what I do too, I focus on learning how to run the analysis, like everyone else, and we all can claim we have good intentions.

    Am I a cheater?

    • Andrew says:

      Anon:

      It may be that my advices is useless to you. But my advice could well be useful to the rest of us when we want to interpret your published papers! I have no doubt that there are people out there who just care about publication without regard to “the quality of the data or the analysis,” as you put it. There’s lots of crap out there, even in high-ranking journals. Thus it’s good for the rest of us to have tools to help us sift through the crap. So you go and build your brilliant career, and the rest of us can try our best to steer clear of it.

      • Pretty sure 007 was being devil’s advocate or something here. No one with the expressed attitude would stick around and read the blog.

        The point was, IF the expressed attitude was the attitude of a real researcher, would the researcher be a cheater, or something else?

        My answer is definitely a cheater. Imagine I hire a mechanic to fix my car, he knows I won’t be crawling under the chassis to examine things, so he calls out a bunch of worn out parts and replaces them. He knows they were actually fine but he needs to feed himself and pay his rent. He’s not doing his real job, he’s doing a sham version of it. This is fraud, and so is writing grants claiming you are doing real research but actually just doing the kinds of things 007 said. Intentionally not reading the repair manual of the car so everything takes longer and you can charge more is fraud… Same with intentionally avoiding real experimental design and analysis issues.

        • yyw says:

          +1. The current structure of academia (publications/grants/promotion) selects people that follow the prevailing unsound methodology, either knowingly or unknowingly. For example, grant reviewers expect to see power analysis based on noisy estimates from pilot studies. If you don’t do that, you are not helping your odds of success.

        • anon_007 says:

          Yes, and my point is that there are many more cheaters out there than Andrew thinks… I think the publication system we have force people to cheat if they are not really, really good. These issues are everywhere, more and more papers are published about these problems in science. But everybody needs to publish, so they pretend they dont know stats, they turn a blind eye, they dont try to know, they prefer not to know and not to talk about it, they pretend it is not a big issue.

          It is strange to me that it is possible that a researcher at the level of Wansink would not be aware of p-hacking, all the issues with p-values and so on. Of course they know. If they dont know, such incompetence would worse than cheating.

          • Dale Lehman says:

            anon_007
            I can appreciate your message about cheating being quite common – I can’t speak to whether it is more or less common than Andrew believes, but I certainly think it is fairly common. So, I can accept the spirit of your comment as bringing the focus to the real world where incentives, payoffs, and penalties are realistic. But, one sentence in your comment bears more attention:

            “One considers one’s moral values only when one’s belly if full and one’s rent is paid off.”

            This cannot go unquestioned. Morality is certainly impacted by basic needs but that is not the same thing as saying that morality only applies when basic needs are met. There are plenty of people with strong moral standards who lack basic nutrition, health ,etc. And, there are plenty of people with ample endowments who are unethical. It is simply not adequate to dispense with ethics because one’s “belly is not full.” For those of us that work in the real world of academia, with all of its perverse incentives, the need to have and keep a job can never excuse us from having to answer to ourselves (and others) ethically.

            • esse_email_vai_voltar@yahoo.com.br says:

              Point taken Dale, you are 100% correct in my viewl; I was more extreme than I should have been.

              Moral values are different. One should not kill. One should strive to do science correctly (not really ethically, I dont think people see their lack of knowledge in stats and still going ahead with it anyways as unethical, but they surely know they dont know stats and they surely go ahead anyways). It is very, very easy for one to willingly dismiss the later under a bit of pressure, or just some fear, let alone the real survival needs. It is infinitely easier for Andrew to do the right thing than for most of us.

              So to me this kind of cheating (willfull blindness) is the rule out there, really. But I guess I also want to say with this that we should not have to rely on the researchers doing the right thing because they wont, they will just keep surviving.

              • anonymice says:

                You’re not anon007, right?

                Anyway I agree that survival considerations can force or at least incentivize pragmatic compromises even when they don’t feel right.

        • Anoop says:

          Hello Daniel,

          A few questions:

          So should all authors be called cheaters now who haven’t pre-registered their study in recent years?
          Should they be cheaters when they know they don’t have 80% power and still go ahead and conduct a study and make conclusions? Almost all research dont even come close to it.
          Should all statistics be done by external statisticians blind to to groups?

          I would call these suboptimal practices and not outright fraud or cheaters.

    • Anonymous says:

      “Lets not be blind, everybody is talking about the problems with science and yet everybody behave the way I behave, because everybody need to publish.”

      Have you ever asked yourself why everybody needs to publish? Why would a university care about the no. of publications their staff produces?

      I have asked myself these types of questions, also on this blog, and from the information others gave i reasoned it could be that universities and other research institutes do not care so much about publications per se, but more about the money that in turn could bring into the institution via grants, etc.

      If this is the case, publishing lots of papers is perhaps “so 2011”. The new possible way to waste the taxpayers money, and grab yourself that sweet tenure spot, is to come up with some giant project to “improve science”, and promote “open science” and “collaboration”.

      You could effectively, if you play your cards right, use up tons and tons of funds that could otherwise be nicely dispersed among researchers equally and hog it all up for your “collaborative” project. Or you could receive private funding, from parties that of course will not ask for anything in return, to pay for your nice treadmill-desks and salary in your new office.

      Any possible overhead “costs”, or other money transfers, from these types of massive “collaborative” and “let’s improve science”- projects could then effectively pay your sweet salary (you’ve earned it, after all you’re “improving science”!), hereby definitely giving you a high chance to get that sweet tenure spot.

      Get with the times dude/dudette!

  5. Andrew Althouse says:

    Good post, Andrew. I was quoted in the NPR piece, but I share your sentiment that we should be talking a little bit less about p-hacking and a little bit more about the other issues in Wansink’s work. It’s also unfortunate that the poor description of a p-value and the statement about “often using complex statistical models” appeared – I agree, most people engaging in this type of stuff are using simple t-tests and chi-squared tests. If anything, we should be talking about how many inappropriate comparisons / tests are performed using testing procedures that are not appropriate for the data that they have.

    • Matthew Poes says:

      Excellent post and great point. I wrote Andrew because I was so bothered by the description. I think it leads lay readers to be further distrusting of complex statistical models. The complexity or sophistication has nothing to do with the problem.

      I’m constantly amazed how many well meaning researchers don’t realize that (for example) breaking up a sample into a set of subgroups and running t-tests for each one is in fact a multiple comparison. The logic seems to be that if they were each run separately and the sub-grouping is not directly related, that it’s not a multiple comparison. If you break up the data in lots of different ways and run a bunch of t-tests, it still capitalizes on chance. You also see this done where by the results are published across different articles, but it’s the same data and same Often a more sophisticated method might help us examine the same variables configured in a different way. More sophisticated methods can actually be better/safer.

  6. Matthew Poes says:

    I think most researchers I know are well intentioned, even when they make these errors. I’m extraordinarily conservative in my analytic methods and research design, preferring to get it right (nand have null results) than to chase publications. I have the honor of working with many researchers across my field of study and reviewing their proposals and final reports/publications. Many of them have committed the crimes that Andrew speaks of. When I question it, none have ever responded in a manner that makes me question their morality. Are they chasing publications and tenure, probably, are they doing work they know is wrong, I don’t think so.

    I think many people view a statistic without interpretation as acceptable. In other words, if they run the model and get a result, publish the result, but don’t draw any strong conclusions, that it is ok. The audience can draw their own conclusions and the error is theirs. What tends to happen instead is that bad statistics are put out which bias the available publications with spurious results. Conclusions are drawn, even if inadvertently. If theory building is fully data driven with no rhyme or reason, there is great risk in finding a big great nothing. This is sin I think is being committed so often. Andrew talks about the noisy data as well, and at least in my world, it’s all a little noisy. Everything we study is abstract and hard to measure. I think the data is made noisy by virtue of the same problem that arises out of a lack of theory. Often we don’t measure school readiness (for example) by what is (and honestly i don’t know what it is) but by what we can measure. If those aren’t the same (and I think they aren’t) we end up with noisy data.

  7. Keith O'Rourke says:

    > problem of researchers expecting routine discovery.
    Yup, that is the primary problems that is clearly seen in “if they are good enough to be with our university” or if they work with a “statistician”, or if they preregistered the study, or if they are a genius, etc. – then they surely could (if they only would, they surely could)

    The only guarantee of good scientific research is that although it will mislead us (hopefully not too often) if it is _persisted_ in _eventually_ how we were mislead will become clear. No closure just places and times it makes sense to pause. There the materials necessary to restart should be given/archived. The only reward should be for those doing things that disrupt the pauses and force everyone to do more research!

  8. Dale Lehman wrote: “For those of us that work in the real world of academia, with all of its perverse incentives, the need to have and keep a job can never excuse us from having to answer to ourselves (and others) ethically.”

    Yup & see below.

    I recently got an invitation to join a symposium in The Netherlands which is related to the implementation of a new version of the new Netherlands Code of Conduct for Research Integrity. This symposium will take place on 2 October and it is hosted by the Royal Netherlands Academy of Arts and Sciences (KNAW). I have accepted the invitation, as this symposium will offer great opportunities to have discussions with various parties about our long-term efforts to get retracted a fraudulent study on the breeding biology of the Basra Reed Warbler in a Taylor & Francis journal, backgrounds at https://osf.io/5pnk7/ I got the same day a confirmation e-mail which was also the admission ticket. I got a reminder a few days ago with the final version of the programm of that day.

    I received today a peculiar e-mail from Royal KNAW in which it was stated that I was not anymore allowed to join this symposium and that Royal KNAW would call the police when I would show up.

    I am a biologist with extensive experiences in the field of nature conservation. I am therefore used to various threats from people who are involved in illegal killing raptors, etc. I am working together with lots of others in this field who have similar experiences.

    It is however very peculiar that Royal KNAW wants to prohibit me to discuss with others at this symposium about about our efforts to retract this fraudulent study on the breeding biology of the Basra Reed Warbler.

    Anyone any idea about what to do now?

    https://www.knaw.nl/en/news/calendar/the-new-code-of-conduct-whats-next?set_language=en

  9. Koray says:

    > Please please please please please let’s move beyond the middle-school science-lab story of the precise hypothesis and accept that, except in rare instances, a single study in social science will not be definitive. The gold standard is not a test of a single hypothesis; the gold standard is a clearly defined and executed study with high-quality data that leads to an increase in our understanding.

    Is that how researchers advance their careers or the government/public reacts to scientific publishing?

    Even if social scientists agree with your gold standard, they *seem* to be speaking very definitely. They demand credit for their conclusions, funding for further research and policy change based on very definitive statements.

    What percentage of practicing scientists actually increased our understanding in their field in the previous decade to meet that gold standard? If we go by Ioannidis’ results in medicine, probably most scientists haven’t done such a thing.

  10. Patrick Handcock says:

    Hello Andrew. As an early career researcher moving between clinical RCT work (during PhD) and now trying to become a better epidemiologist (and a good as I can reasonably be at statistical methods in the process while juggling all these things) – your writing has been a really important base for me – so thanks! Reading Anon’s comment above struck a bit of a cord and is consistent with my observations so far as well about how the ‘system’ seems to work (even in high tier Universities…). Hypothesis and narratives get promoted over data, data gets tortured and the priority/currency is papers (ideally in high impact journals) and more grants to secure programs and careers. Often when the statistics ‘get in the way’, these get simplified down in the interest of a ‘coherent and clear’ story (and if lucky the limitations are mentioned, usually peripherally, at end of paper).

    I’ve been in this situation as a co-author a couple of times recently and wondered – “should we actually be publishing this paper” (i.e. is it really robust enough and helping moving the science forward). Of course that question doesn’t go down very well (particularly if coming from an early career researcher and after a lot of work is put into the paper) and makes one feel a bit powerless to effect change. Also often PI’s are so detached these days (since they are often juggling and far removed) and know so little about stats, that the pressure is all put on the lead author (these days likely a PhD student/ECR) who is still very much in training. PI is looking for more papers that talk to their broader narrative/story, PhD student/ECR need papers and need to impress the PI to secure next career stage, not a good recipe for sounds or replicable science… The incentives are just all wrong, and as Anon says, we all morally know what we need to or should be doing, but we all also need to put food/papers on the table (or else find another career or just teach). I’m being a bit dramatic of course and this is not always the case, but I think it is probably increasingly the norm and the current research climate perpetuates it….

    Sorry, I’m rambling and keyboard bashing… This will be nothing new to many who have been at my stage, but I guess the main question I’ve been having is – “what can we/I do to make things better” – and I’m not really sure who ‘we’ actually is or if this is just an idealistic pipe dream… we all talk about it, but changing the system and culture is hard! I’d like to try and make a meaningful difference (but also ideally keep a job in the process – I really do love research), but where to start?

    What seems most obvious to me (as sad as it is) is that “money talks”. Perhaps Fellowship/grant funders and employees need to place far less emphasis on headlines (often perpetuated by social media and media departments in research institutes needing to look like they actually need to be there) and number of papers in high impact journals. For example, a lot of hypothesis generating epidemiology ends up becoming ‘the answer’ and ‘pivotal work’ in media articles. If there was more emphasis on doing ‘high quality’ and replicable work and in teams (which would take the pressure of individual) – perhaps there would be less incentive to ‘cheat’ or plead ignorance for the sake of being ‘productive’ and keeping careers afloat. If we had security – then there might be a greater incentive to do the research right to get closer to the truth.

    Other thoughts:
    – Allow researchers to come up with unique ‘theories’ and hypotheses (since they may know the area/physiology/ground better), but only allow well-trained statisticians (ideally independent if possible from the narrative being pushed) to do the stats to see if the theories and hypothesis actually hold, or if the study is even worthwhile doing until data is more reliable? If not, don’t force the publication and find better data that is more conclusive. Current research climate clearly would allow this…
    – contradictory to last point. Less emphasis on unique/flashy ‘theories’ and hypotheses (which is often just a sales exercise for grant panels), and more emphasis on important questions that need answering, and systematically building towards the truth. Only publishing work when we feel like we actually have something robust to say (to avoid media and public confusion)… seems a little idealistic, I know…
    – Hope that in the broader scheme of things… science will be ‘self-correcting’…hmmm
    – Completely remove conflict of interest in terms of outsider funding, but also in terms of career building… not sure if that is even possible to be honest… even statisticians ‘need’ to at least keep their careers afloat and probably many make a name for themselves or be heard (otherwise they’d just get a non-academic job that pays better) – unless they got lucky with some good/interesting data.

    I’ve rushed this a bit and feel it might come off as a bit cynical when I come back to it… will need think about it more – but would love to hear your thoughts. In particular – how can the system be improved?? and what can people like me to help (while I have the luxury of a 4 year fellowship and some academic freedom – which will very quickly need to be renewed…). Thanks again for sharing your thoughts.

  11. Brian Reddy says:

    Good ole Feynman reminds us, “The first principle is that you must not fool yourself – and you are the easiest person to fool.”

    But Feynman was also criticizing social / nutrition science 35 years ago: https://www.youtube.com/watch?v=tWr39Q9vBgo

    The argument of expecting routine discovery makes sense…but then why isn’t the nutrition world (and so much health research) increasing their standards? At what point do they realize current practices haven’t been working? As a personal trainer I’d say my clients are as confused as they’ve ever been about nutrition. Today meat is fine but tomorrow it’ll cause cancer.

    I think you’re painting a bit too rosy of a view here, at least when it comes to the nutrition research world. It’s routinely either small sample sized or if it’s large, it inevitably uses surveys asking people to remember how they ate in the 90s. When we finally have a guy doing credible, lab controlled studies, from a prestigious school, we find he was throwing numbers around like paintballs. The PhDs in nutrition aren’t tricking themselves that easily. The everyday person on the street knows asking people to describe how they eat is bogus, yet it’s standard in nutrition papers.

    And sure, studying humans is very hard, probably the hardest thing to study, but quantum physics ain’t easy. Thus, physicists increased their p-values to five sigma, built the Large Hadron Collider, etc. (And for the omnipresent “you don’t get it, I have to publish for my career,” somehow publish or perish isn’t much, if any, of an issue for them. We don’t have new laws of motion popping up every week.) Maybe we can’t get to that level of precision with humans, but we have a lot more tools we aren’t even using. Have we not seen enough flaws with .05 p- values?? If you’re not going to go with fraud, then laziness is the most operative word.

Leave a Reply