What should this student do? His bosses want him to p-hack and they don’t even know it!

Someone writes:

I’m currently a PhD student in the social sciences department of a university. I recently got involved with a group of professors working on a project which involved some costly data-collection. None of them have any real statistical prowess, so they came to me to perform their analyses, which I was happy to do. The problem? They want me to p-hack it, and they don’t even know it.

The project reads like one of your blog posts. The professors want to send this to a high-impact journal (they said Science, Nature, and The Lancet were their first three). There is no research question, and very little underlying theory. They essentially dumped the data on me and told me to email them when “you find something significant.” The worst part is, there is no malicious intent here and I don’t think they even know they they’re just fishing for p <.05. These are genuinely good, smart people who just want to do a cool study and get some recognition. I don't know if you have any advice to handling this sort of situation.

My recommendation is to do the best analysis you can, given your time constraints. If there are many potential things to look at, you might want to fit a multilevel model.

In any case, write up what you did, make graphs of data and fitted model, give the manuscript to the professors and let them decide where to submit it.

You’ll have a lot more control over the project if you write up your findings as a real paper, with a title, abstract, paragraphs, data and methods section, results, conclusions, and graphs. Don’t just send them a bunch of printouts as if you’re some kind of cog in the machine. Write something up.

My guess is that your colleagues/supervisors will appreciate this: Writing up results is a lot of work, and a student who can write is valuable. Here are some tips on writing research articles.

It’s fine if these profs want to change your paper, or rewrite it, or incorporate it into what you wrote (as long as they give you appropriate coauthorship). If in all this manipulation they want to submit something you don’t like, for example if they start pulling out p-values and telling bogus stories, then tell them you’re not happy with this! Explain your problems forthrightly. Ultimately it might come to a breakup, but give these colleagues of yours a chance to do things right, and give yourself a chance to make a contribution. And if it doesn’t work out, walk away: at least you got some practice with data analysis and writing.

76 thoughts on “What should this student do? His bosses want him to p-hack and they don’t even know it!

  1. Exhibit A for putting more pressure on stat textbook authors to thoroughly cover all the material, e.g., here:

    https://hardsci.wordpress.com/2016/08/11/everything-is-fucked-the-syllabus/

    and here:

    http://statmodeling.stat.columbia.edu/2016/09/21/what-has-happened-down-here-is-the-winds-have-changed/

    Most students and professors aren’t learning from Cuddy et al. to figure out how to do conduct and analyze their studies. Instead, they are learning from textbooks. The more stats textbooks that cover this stuff, and cover it thoroughly, the easier it will be for everyone to do better. Wouldn’t it be nice if students could point professors to Chapter X of the standard textbook for their stats 101 course and say, see, if we want to report p-values, we need to preregister our study. Wouldn’t it be nice if the professors already knew that because the textbooks they assign actually covered all these issues in depth?

    As far as I can tell, however, few, if any, textbooks cover any of this.

    • Ed:

      I agree, and many times I’ve criticized statistics textbook writers—including myself!—for presenting a misleading story in which statistics is all about getting the data, running the analysis, getting that confidence interval that excludes zero, and declaring victory. See section 2 of this paper for an example of me saying that statisticians have not been doing a good job here.

      We have a bit on these issues in Regression and Other Stories. Not enough, I’m sure, but it’s a start. As for Statistics 101, yes, I’d like to do that too. I’m not quite ready yet but maybe in a couple years. In the meantime I hope other authors can include this material in their intro stat books. The trouble is that what’s really needed is not just to add a section or even a chapter, but to rip out much of what’s already there and start over.

      • Here are some practical suggestions:

        1. I’m sure many readers of this blog are asked to review data analysis textbooks, either before or after publication. Reviewers should use this opportunity to push for inclusion of one or more chapters on the replication crisis.

        2. Many readers probably sit on curriculum committees of one sort or another where again, they could highlight the need for inclusion of material on the replication crisis. Typing this out, I realized that a committee I’m on just approved a new data science major, and it didn’t even occur to me to ensure the curriculum included stuff on replicability. Bad!

    • I think more to the point: write textbooks so that they don’t mention p values and statistical significance outside of the very narrow proper application (basically filtering unusual events out of a large sequence of measurements)

      Instead, start with the basic concept of Bayesian inference as constructing a joint distribution over known measurements and unknown parameter values, and then move on to describing how to build mathematical models of phenomena: ODEs, deterministic regression models, discrete markov models, etc etc.

      • Why do I have the feeling that if, back in the early 20th century, Bayesian methods became the standard instead of Frequentist methods, we would today still find ourselves in almost exactly the same position, with journal editors settings “standards” for publication, and hence authors engaging in rampant hacking of priors and likelihoods?

        • Because you (and Daniel, I feel sure) have a thorough understanding of the incentives and resource constraints at play here. I think Daniel’s suggestion is more about simply relegating p-values to their proper place than it is about Bayes being a panacea.

    • Ed,

      Of possible interest as a resource for people wanting to point out problems in use of statistics: https://www.ma.utexas.edu/users/parker/cm/index.html

      The course might be called “Things you were not taught in statistics but should have been taught”. The materials are open-source; Mary has put Word files for the class notes on her page above, to make it easier for others to use or adapt them. There are also other relevant links from http://www.ma.utexas.edu/users/mks/CommonMistakes2016/commonmistakeshome2016.html

    • “Wouldn’t it be nice if students could point professors to Chapter X of the standard textbook for their stats 101 course and say, see, if we want to report p-values, we need to preregister our study.”

      I agree with (the gist of) what you write!

      Just for possible consideration, i believe students (and other co-authors) can already point professors (or other researchers) to sources that indicate “good” practices. For instance, if i am not mistaken, the APA Manual states the following: “mention all relevant results, including those that run counter to expectation” (APA Manual 6th Edition, 2010, p. 32).

      Next to this, i assume (some) unversities have a code of conduct for performing and publishing research which also could be used by researchers to point to how to conduct and report scientific work.

    • I never understand this. Why can’t the student walk away? These type of statements also remind me of “the reviewer made me do it”-type of statements. I don’t get those either…

      When i was a student trying to publish my 1st paper, i was told to do leave out certain analyses by a reviewer and subsequently by my professor (who said something like “the reviewer is always right”).

      I explained why i wanted to keep them in the manuscript, kept them in, and was prepared to buckle down on that even to the point of withdrawing the submission in that particular journal if the editor and/or reviewer would not agree with it.

      You always have an option i reason. I am glad prof. Gelman stated the following:

      “Ultimately it might come to a breakup, but give these colleagues of yours a chance to do things right, and give yourself a chance to make a contribution. And if it doesn’t work out, walk away: at least you got some practice with data analysis and writing.” Key things for me: 1) try and work together, and 2) make and own your decisions.

      Also think about the Wansink case where the student may now regret not walking out…

      • I was thinking about the issue of student responsibility in relation to Brian Wansink. If his students’ theses or dissertations contain a similarly large number of errors as Wansink’s papers, should the students’ degrees be revoked?

        • “If his students’ theses or dissertations contain a similarly large number of errors as Wansink’s papers, should the students’ degrees be revoked?”

          No, i don’t think that would make sense. Errors can happen, especially when you don’t know better and are not yet supposed to know better.

          If anything should be revoked here, i reason it’s Wansink’s “mentor” role and function.

    • I disagree. Science requires martyrdom from time to time. You always has the option of walking away, as a last resort. However, there are usually creative alternatives that don’t require sacrificing one’s integrity.

      • Directed at both JG and Anonymous, above.

        My initial thought on reading that was “Them are some big words.” But I can do better.

        Yes, we students always have the option to walk away. But consider that our student status [almost universally] indicates we haven’t yet put in the requisite study or hours of in-the-trenches work to have developed the wisdom to know when and where to draw the line. We are supposed get some that wisdom from our [often methodologically old fashioned] faculty mentors.

        “Why can’t the student walk away?” also reminds me of the same question occasionally posed of survivors of domestic abuse. The reason we don’t walk away is fear. Let’s cut to it. Some advisors are manipulative, hateful, and abusive people. Students who don’t speak up or walk away are sometimes just trying to get through so they can start fresh outside of grad school. So, sure, we methodologically inclined students can make martyrs of ourselves. But wouldn’t it be better if there were resourced in place to insure we didn’t have to?

        This is why Andrew’s “rip out much of what’s already there and start over” comment, above, is so needed. Although the tides are rapidly changing, citing blog posts for our methods crusades won’t fly in many of the grad departments within the applied sciences. We inexperienced grad students will be on more stable ground armed with text books explicating these methods and with tutorial papers published in respected applied methods journals. Not only will these give us some semblance of authority if/when we fight such crusades, but they’ll help insure we’re building the wisdom necessary to know when and how to fight.

        • “But consider that our student status [almost universally] indicates we haven’t yet put in the requisite study or hours of in-the-trenches work to have developed the wisdom to know when and where to draw the line. We are supposed get some that wisdom from our [often methodologically old fashioned] faculty mentors.”

          I don’t understand your reasoning: hours of work result in developing wisdom regarding when to draw the line? I don’t necessarily see that connection at all. Given the state of psychology, perhaps you could even state that “hours of in the trenches work” has proven not to lead to “knowing when to draw the line”…

          Perhaps the type of reasoning i extract from your writing may be directly connected to the state of psychological science in the past 20 years or so. You cut a corner here, you think that’s not that bad to do, you just want to get through your PhD. But then you just want to get tenure, so you’re cutting a few more corners here and there because once you get tenure then you can really “make a difference”. Before you know it, everyone does it, and everything doesn’t make any sense anymore.

          I can’t stand the all the p#ssy-footing, and deflection of responsibility and accountability in psychology that i have heard, and still am hearing, over the past 5 years or so. I think it’s shameful.

          I was a relatively older student when starting my psychology research master, am a bit odd, and very rigid and conscientious concerning things i find important (e.g. science). This may have influenced my reasoning and actions back then. I’ve said and done more “weird” things in my brief life in academia in line with my example.

          As a student, i came across many psychological scientists. I quickly estimated who to take seriously and listen to and who not. Concerning my paper, i just thought of the readers and reasoned that if i were reading the paper, one of the 1st questions i would have would be “but what if he would have analyzed it this way?”. That’s why i wanted to add the extra analyses. I even thought immediately after finding my results that if i were a professor (and would thus have the time and resources) i would replicate the study with a larger sample just to make sure before publishing. These are just 3 examples of things that i wasn’t taught by any “mentor with hours of experience with in-the-trenches work”, but thought of myself. In the last 5 years or so after my graduation, i have come to the realization that i am extremely glad i thought, and acted, the way i did back then.

          I guess i am trying to make 2 points:

          1) I think that scientists (even students) have a choice, a responsibility, and accountability for their actions.

          2) I would hate to see students not following their own reasoning, and/or intuition, just because a “mentor” said so.

        • If your point #2 is absolutely correct, why do students even bother to take courses at all? Just follow their own reasoning and intuition?

          In practice, there is a spectrum: at one end of the scale are students who simply believe and internalize everything any “mentor” tells them, without critical thought. At the other end of the spectrum are two kinds of people: 1) true maverick geniuses who create or change fields of study, and 2) people who think they’re geniuses and mavericks. (I just listened to an interview with Geoffrey Hinton, a Machine Learning guru, who truly is a true maverick and genius. He’s also quite humble, which is a breath of fresh air. )

          It sounds like perhaps you had experience in psychology as a practitioner and went back to school to codify this with a master’s degree. So your disagreement with your advisor and the reviewer was more of an applied-versus-academic disagreement. The mentor-mentee gap was smaller in your case and more of the gap revolved around prior cultures than for most students.

          But of course, the whole point of being a student is to learn from a series of mentors. That doesn’t mean that every mentor is infallible and your point that students need to exercise a level of skepticism is important in this day and age of groupthink. At the same time, some of your narrative sounds uncomfortably close to, “I’m very smart and experienced. If I disagree with someone, I therefore know they’re wrong and I’m right.” which would simply be the flip side of the academic groupthink coin.

          So I overall agree with you, but I would add some level of self-awareness and humility is necessary for point #2 to be helpful rather than dangerous.

        • #”If your point #2 is absolutely correct, why do students even bother to take courses at all? ”

          Exactly! I reason it all depends on what, why, and how things are being taught.

          I guess about 60% of my “education” was reading, and answering questions on exams, about a collection of psychology papers probably consisting of some under-powered, p-hacked, study and “theory”. Leaving aside the possible conclusion that in the last 5 years or so it may have become clear that these papers may not contain much useful scientific information, you could perhaps have predicted the possible uselessness of such an approach to education. Even though you could argue that everything in science is “fleeting” in a way, i think it is possible to at least attempt to teach the less fleeting things. To me, this 60% of my education was all about relatively (predictable) fleeting things, and put the emphasis way too much on teaching “what” to think, instead of “how” and “why”. To me, these courses were all a waste of time and energy.

          I guess about 30% of my “education” involved statistics. I never heard any real information about what a p-value is, how it relates to hypotheses and studies, that there are many assumptions when interpreting p-values (?), etc. I am very bad at statistics. I passed all statistical courses, and still was not confident that i could analyze my own data for my 1st study (which was the thesis for graduation). During the summer break, i bought Andy Field’s “Discovering statistics” and went through the book multiple times. I learned about assumptions, non-parametric tests, etc. which i don’t remember hearing about in my “official” statistic courses. To me, my “official” statistics courses were mostly a waste of time and energy.

          I don’t exactly know what the other estimated 10% of my education consisted of. I do remember starting a voluntary course in “writing”, and was confused by the advise given during the 1st lecture (think in line of Bem’s “writing the empirical journal article”). I immediately thought: “Huh, shouldn’t this be the other way around” (in line with what is now called “registered reports”). I dropped that course after 1 or 2 classes. Again, to me, my “official” writing course was mostly a waste of time and energy.

          I never heard any real information about (the importance of) replication. I never heard any real information about what psychological theories are, how to test them, etc. I never heard any real information about publication bias. When encountering all this information after my education, i was shocked: why did i never hear about any of this?!?!?! I reason that what can perhaps be seen as essential topics regarding science (and perhaps the less “fleeting” things), were left out of my education. An education that was “highly regarded” and was supposed to teach me how to do psychological research. To me, this is incomprehensible.

          To summarize my take on my 2-year “research master education”: I learned almost nothing of scientific use. Perhaps i even learned to do the wrong things from a scientific perspective. Perhaps it was all a way to make students produce useless papers so their “mentors” would have an additional paper to put on their CV, and the university could put some money in the bank.

          # “It sounds like perhaps you had experience in psychology as a practitioner and went back to school to codify this with a master’s degree. So your disagreement with your advisor and the reviewer was more of an applied-versus-academic disagreement. The mentor-mentee gap was smaller in your case and more of the gap revolved around prior cultures than for most students.”

          No practitioner. Before going to university i had jobs like being a mail-man, factory worker, gardener, and mover.

          #”So I overall agree with you, but I would add some level of self-awareness and humility is necessary for point #2 to be helpful rather than dangerous.”

          I agree that self-awareness and humility are useful, and in line, with science. I reason that self-awareness and humility are important for everyone involved in science, which includes students and professors. I also reason self-awareness and humility might be related to teaching students more of the “how” and “why”, instead of the “what”.

        • Humility about student experience, indeed! What huge percentage of psychology papers are studies precisely of undergraduates, yet purport to announce findings generalizable to the adult population? The professors don’t seem to see a problem.

        • “Perhaps it was all a way to make students produce useless papers so their “mentors” would have an additional paper to put on their CV, and the university could put some money in the bank. ”

          Just came across this from Stapel’s book on his fraud: https://errorstatistics.files.wordpress.com/2014/12/fakingscience-20141214.pdf

          “Doctoral dissertations are an important source of funding for Dutch public universities. So the more PhD students who graduate, the better. For every successful dissertation, the government hands over a little more than $100,000. Naturally, this leads to some creative ideas for ways to increase the numbers of doctorates being awarded. Some of the money is shared with the individual departments that produce the greatest number of dissertations, so there’s also a decentralized financial incentive to increase throughput. Individual professors who graduate more candidates than the rest enjoy greater prestige, and are generally closer to the head of the line when it comes to promotions and landing plum committee jobs.” (p. 5)

          Now, it’s all starting to make some sense :)

        • “I can’t help but wonder if Stapel is exaggerating the situation, to shift some of the blame from himself.”

          In his book, to me he does not do much blaming (which i think was generally the view of others too). The closest thing to blaming is when he states:

          “I’m not the only bad guy, there’s a lot more going on, and I’ve been just a small part of a culture that makes bad things possible. A story with just the one bad guy is always neater.” (p. 171)

          I can totally see how there may be money or other things involved here. I never understood the “pressure to publish” argument, or perhaps better stated i wonder if that’s only half the story. Why would a tenured professor care about how many papers he/she publishes? Why would a university care about how many papers their scholars publish? Then you start trying to connect these things to the usual suspects: money, power, politics, etc.

          I also found this quote interesting (top comment from “readers pick”) about Wansink: https://www.nytimes.com/2017/10/23/upshot/the-cookie-crumbles-a-retracted-study-points-to-a-larger-truth.html

          “Last, if you’re wondering why Cornell didn’t recognize that the IRB protocol didn’t correspond to the published results, figure that Cornell collected at least $1,000,000 in overhead from those grants. That money paid the salaries of the IRB staff and lots of other things in Wansink’s Department. Cornell didn’t notice because it didn’t want to notice. If Cornell is anything like my university, it measures research by dollars raised, nothing else.”

          It has often been stated that PhD-students, and post-docs, are some sort of slaves for their professors/labs. Being paid minimal wages (if anything), working too many hours, pressured into bad science, etc.

          Perhaps, professors themselves can be seen in the same way, the only difference being that they do have a decent salary.

          Perhaps both groups of scholars may one day begin to wonder what they really have been doing all these years…

        • Anonymous said,
          “… Cornell collected at least $1,000,000 in overhead from those grants. That money paid the salaries of the IRB staff …”

          Huh? My understanding is that folks on university IRB’s serve in that capacity as part of their “service” (committee work) expectations, and don’t get any kind of reimbursement other than their regular salaries, which generally come from non-grant sources. Maybe that varies from university to university?

        • “Perhaps, professors themselves can be seen in the same way, the only difference being that they do have a decent salary.

          Perhaps both groups of scholars may one day begin to wonder what they really have been doing all these years…”

          Climbing, and/or reaching the top of, a mountain can be 1) fullfilling and 2) lets you possibly see the beauty of the world around you.

          Regarding both though, perhaps it matters which mountain you are, or have been, climbing.

  2. I am really starting to lose faith in applied academic research. I was sitting in on a lab meeting, where they presented uncorrected p-values in a seed based analysis of connectivity in MRI data analysis (which I am not a fan of seed based to begin with, I spent an entire few months developing a bayesian low-rank graph regression model, which only I use.. and takes forever), where they presented uncorrected p-values. I had to bite my tongue, because I had been getting on my lab about methods they’re using to predict phenotypes from connectivity, and the methods really only perform a better on some test/train samples (by far not ALL), and my boss was getting pissed.. the list goes on. I’m on the verge of leaving academia – we work in a system that perpetuates bullshit.

  3. “The professors want to send this to a high-impact journal (they said Science, Nature, and The Lancet were their first three). There is no research question, and very little underlying theory. They essentially dumped the data on me and told me to email them when “you find something significant.” The worst part is, there is no malicious intent here.”

    The intent is quite clearly to get a high profile paper out of little hard work and insight. That may not qualify as “malicious” but it doesn’t look so “good and smart” to me either. Not that it helps the student but lack of “statistical prowess” seems to be a rather lame excuse to me. There are enough people outside Stats who can see what’s wrong with this approach.

    • Perhaps this could be a new fruitful avenue to pursue for “career-scientists”. Just let a student analyze the data, and if sh#t ever hits the fan, you can state you were not the one who did the analyses.

      Also note the following sentence in the NYT piece on power posing: http://nymag.com/scienceofus/2016/09/read-amy-cuddys-response-to-power-posing-critiques.html

      “I also cannot contest the first author’s recollections of how the data were collected and analyzed, as she led both.”

      • I also recall a case in which fake data was used in the analysis but the faker was a student and the first (and only) author didn’t want to throw them under the bus. I think this case was covered in this blog, and the fact that the first author indeed did always write how “he” analyzed data did not turn out to be true. Whatever the case is, this–delegating analysis to students–is a way to escape responsibility.

        • Publication also seems to be a key to success.

          You could make deals with some of your academic friends to provide some comments on each other’s manuscripts, and could therefore be “officially” added as co-authors on each other’s papers.

          Or you could simply be added as co-author on papers by your students for no real reason other then you being the “official” mentor (without necessarily contributing anything useful), just because everyone seems to do that.

          You could double, triple, quadruple the number of your publications with ease!

          It would kind of be rigging the system, so i guess nobody has ever done that…

  4. Maybe the student should tell the professors “how about something using machine learning?” and use the lasso or something related, i.e., get them to do principled instead of unprincipled data mining. The profs apparently don’t know much stats and so will probably want to interpret the results as causal instead of predictive, but at least the results won’t be unprincipled gibberish.

    (apologies if this double-posts – having browser problems)

    • +1
      Something like this could work here. I think the student could explain standard practice like splitting into training, validation and test sets and ‘model selection’ procedures more easily than eg direct multilevel modelling. If they want to do data mining then just try to do it properly’. They get to use the shiny words ‘machine learning’ too.

      • The problem with that is they will have no idea what you are talking about. There will be no one on your committee, etc you can rely on to vet your work. Also, you will have to spend tons of time explaining the most basic stuff about the tools to even hope to get them to understand the difficulties/issues you are facing.

      • To be honest, if they are asking the student to p-hack and don’t even realize it, they are unlikely to get the concepts behind cross-validation etc., at least to begin with. But the idea of automatic model selection is likely to appeal to them straight away, even if they don’t actually understand the principles behind it. Plus shiny!

      • I agree with Mark Schaffer than getting them to understand cross-validation is a stretch.

        But Training-Test, (keeping a holdout to the end), should be reasonable enough. For one thing, the student can probably find examples of testing on a holdout in the relevant literature, so it won’t seem strange.

        And during the investigation, they can run cross-validation (on the Training data) for their own sanity.

        Almost regardless of what the student does, they will get requests to look at the data in a different way. This may be phrased as “[we / the client / the foundation / the taxpayer] spent $millions for us to assemble this dataset, and it’s our duty to squeeze all the information out of it we can.”

        So let’s assume this is going to happen with p>.95, and the best defense that’s likely to be easily explainable is to do a holdout (or maybe more than one).

        • I wasa reminded of this paper from awhile back:
          http://statmodeling.stat.columbia.edu/2017/08/28/using-statistical-prediction-also-called-machine-learning-potentially-save-lots-resources-criminal-justice/

          The cool thing about it was that they actually kept a real hold-out (“lockbox”) to avoid overfitting. At that time we still hadn’t seen the results on the lockbox, but I just went to check and the final paper was now published: https://academic.oup.com/qje/article-abstract/doi/10.1093/qje/qjx032/4095198/Human-Decisions-and-Machine-Predictions

          Sadly, the final version is pages and pages of discussion on their (surely overfit) results on the training data. The results on the lockbox data are relegated to an appendix:

          As noted above, one way we guard against this is by forming a true hold-out set of 203,338 cases that remained in a ‘lock box’ until this final draft of the paper. We obtain very similar results in this ‘lock box’ as in the ‘preliminary’ hold-out set.
          […]
          To account for potential human data-mining, this lock box set was untouched until the revision stage (this draft): in Table A.8 we replicate key findings on this previously untouched sample.

          They say the results were “similar” but I didn’t get access to the appendix, so I’m not sure exactly what happened.

          Anyway, my point is performance on the true holdout is of primary importance, it makes no sense to treat that as a minor aspect of the analysis. I don’t even know that I care about the results they present at all*… This is the type of thing that will still go wrong when the research community just doesn’t “get it”.

          Of course this project was exceptional in that they actually kept a “lockbox” of data, even though they went on to misuse it. Most will just report the performance of a model overfit to the cv/holdout score, which is basically the ml version of p-hacking.

          *eg, here are the headline results:

          one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates.

          What were these values for the true holdout? Why are they missing from the main paper?

      • unsupervised machine learning is a reasonable approach to explore the data. perhaps to facilitate the development of tentative hypotheses, which could be used to design future experiments that would gather new data. That is what I’d suggest

  5. In many areas of research p-hacking, etc is simply the standard behavior. I doubt you will be able to convince them anything about changing their ways, so its probably best to just try to get out of the project asap.

    Try to wrap he project up quickly as something preliminary (that you don’t have to publish under your name) and get a different one that is actually feasible, because their (scientific) goals are probably impossible to accomplish from the available information and your training.

    If you try to do things right, you will basically be taking on the workload of dozens/hundreds of people who previously p-hacked instead of doing their jobs. To make it worse, they probably didn’t even train you with the skills you need to do things right (usually math/logic/programming training is very poor) so you will have to spend a lot of time self-teaching.

    They aren’t bad people, but (as Fisher predicted) there is “a dense fog in the place where their brains ought to be”. To me this somehow made things even worse since you can’t even really be made at them. You will eventually just need to accept this and move on with your life/work.

    Before agreeing to work with anyone in the future the first thing to assess is whether they are going to do the p-hacking/overfitting thing. You should decide whether it would be worth the money/reward to commit these acts before getting involved with the project.

  6. If you are making intellectual contributions to the study in the form of data analysis decisions, then you should be a co-author. If you are a co-author, you should insist that the ultimate manuscript describe, from beginning to end, the sequence of decisions that led to the final results. A sufficiently thorough description will alert reviewers and readers to the noisiness of your process, thereby making the results only of interest if the discovered signal is sufficiently strong. To really underscore the point, you should insist that the paper state unequivocally that the results should be taken only as hypothesis-generating, and frame any conclusions in the form of observations which should be pursued futher in a validation study, as opposed to truths established by the present study.

    • Fine ideas but not very practical. The other authors can proceed as they like and if the student (justifiably) makes a fuss simple leave them off the paper. The student may end up with the satisfaction of having done the right thing but at the costs of having no publication and also having annoyed their coauthor peers. For most, that’s not an attractive bargain.

  7. FWIW, this reads to me as a student in social sciences (poli sci, econ, soc,…) who got connected with some people in another department (medicine or public health). In my experience (econ), that list of journals is not where a social science person would first target and if I spent a lot of time/effort collecting data, I would know what to do with it. Given this reading, it doesn’t sounds like he/she is dealing with his advisor, but with outsiders. So, I think Andrew’s suggestions make a lot of sense. It sounds like this student can easily walk away because they are not in the same department, i.e. not on his dissertation committee.

  8. I recently got involved with a group of [social science] professors …. None of them have any real statistical prowess, so they came to me to perform their analyses, which I was happy to do.

    This is the most frightening thing I have heard all week.

    In what social sciences is it acceptable to be statistically inept? It isn’t true of all the social sciences, so which sub field are we talking about here?

    • After reading the latest discussion on Psychmap facebook group, i wonder if statistical ineptitude is the biggest problem in social science/psychology…

      https://www.facebook.com/groups/psychmap/permalink/505931686450425/

      I always wondered why i wasn’t taught anything about logic, reasoning, argumentation, etc. It seems relevant to lots of things in science: interpreting papers, hypothesizing, forming conclusions, etc.

      In my view, this should be a (much bigger) part of social science/psychology education.

  9. My general strategy in such situation is to ask the provider of the data to give me a detailed list of questions, and my request comes with a “if you have no question, I have no answer” warning. It forces the data owner to think at least a little bit on what (s)he is looking for. Most of the time, the list is short (writing all the hypothesis of p-hacking being tedious also for the data owner…) even if sometimes I get a simple text such as “cross all 36 first variables will all the 72 last variables of the data set, for each of the 4 sub-groups of variables X, Y and Z”. The strategy is far from perfect but it’s a first step toward a discussion in which I often succeed in making the data owner realised that they don’t know what they are looking for. Sometimes, it’s the length of the results log (several hundreds pages) that makes the data owner realise the futility of their demand. Unfortunately, it does not always work.
    I had recently a student working on this issue of p-haking and on a real data set, we succeded in building literally millions of models… This sort of information is often able to make a data owner realise how vacuous may be a p-hacking analysis.

    • Yes, this!! I’ve gotten the “just look what’s in there” requests too and in addition to the p-hacking issue I think it’s a) weird, they are supposed to be the subject experts, don’t they know what they’re looking for? and b) them asking me to do their job for them, which
      I think is is borderline rude – is my time less valuable than theirs? (This happens with people who are about the same career point, and I’m not even a statistician).

  10. If I kill someone unintentionally it is manslaughter; intentional, murder. Either way, someone is dead. The student knows there is a serious problem and a lot of the discussion is to ask the student to commit murder so that the professors can get away with manslaughter. The learned people on this blog should not be supporting manslaughter and murder just because it is commonplace in the social sciences.

    At a minimum, the student should split the data set. Find relationships in one part and confirm them or not in the other.

  11. I’m kind of surprised that you responded in the way you did. While not an academic anymore, I have been trained that it is very important to not fish and that well thought out research questions are basically non-negotiable. In my work as an analyst in healthcare I would not accept a request such as the letter writer had. My advice to them would be to push back, have a meeting with the professor(s) and try to eek out some sort of research question or questions that the letter writer can evaluate, or decline the project.
    Understanding this might be more tricky in an academic setting than an industry setting.

  12. This post make me thinks of data scientist dilemma.
    What are your thought about exploratory analysis and the researcher degree of freedom? I find it pretty hard to deal with it. On one side data scientists are often supposed to look into the data to “find” interesting results on the other side if you do it and “find” something then you are not supposed to do hypothesis testing and can’t report your result as “significant”.
    Would working with small sample size help?

    • The problem really is overinterpretation. I’m all in favour of exploratory data analysis, and it’s good if you find something interesting. But because of (among other issues) researcher degrees of freedom this may not be meaningful but rather a result of your trying hard to find something or doing lots of stuff some of which may not survive replication. Or it may be meaningful, we don’t know. So the appropriate way to report this is that it’s an interesting feature of your particular dataset that may deserve a focused study on new data (particularly if there is an insightful explanation that seems worth spending more energy on).
      Correct, you shouldn’t report it as “significant” but this doesn’t mean that you can’t report it at all. (OK, we can discuss about incentives, journal policies, reviewer attitudes and all that, but basically we should be happy that people do good data analysis without announcing scientifically proven significant “discoveries” all the time.)

    • Another:

      As I often say, I think the solution is not to report a single comparison or some subset of comparisons that happen to be “statistically significant,” but to report and analyze all the comparisons of potential interest. Selection is bad enough, but selection on statistical significance is, in many settings, just selection on noise.

    • Thank you Christian Henning and Andrew for your answers.
      “So the appropriate way to report this is that it’s an interesting feature of your particular dataset that may deserve a focused study on new data”
      Wouldn’t working on .a small sample of my data, finding something “interesting” and then testing the significance of it on the rest of my data achieve the same thing?
      However, by doing that you aren’t you still using your researcher degree of freedom?
      Andrew, I agree with you. However, on a practical side (and coming back to the article) when dealing with executive they usually want to only read the interesting subset part and are expecting the result to be significant.
      However, as you just said this is not a proper way to proceed and I am not really sure how to proceed.
      Any thought?

        • Thank you Christian for your answer!
          Would you happen to know any literature/blog article that treat this topic?
          I believe it is really two topics when I think about it:
          – How to report finding from data exploratory when there are no clear questions without going into the researcher degree of freedom? (Andrew answered this one but I am wondering if there is a more detailed source that would also deal with the exec in the room problem)
          – How to properly do a focused study after “finding” something in an exploratory analysis?

        • Sorry, I write what I think, not sure where in the literature to find that.
          Obviously, after having found something in exploratory analysis, you could specify a research hypothesis, decide a protocol for analysis (be it frequentist or Bayesian), collect new data (or use data that you had put aside assuming that you really didn’t involve them at all before) and run the pre-specified analyses on them.

  13. “Wouldn’t working on .a small sample of my data, finding something “interesting” and then testing the significance of it on the rest of my data achieve the same thing?”
    Not if you test more than one thing on the rest of your data. In principle you can do it in this way so that the p-value on your “test data” is not affected by researcher degrees of freedom. But this doesn’t seem to be a good strategy to me because the one thing you find and like best in your small sample may fall down on the test data and then you wasted the bigger part of your data. Or otherwise you may start to allow yourself so much flexibility (trying lots of things in this way) that researcher degrees of freedom will still eat you in the end.

    • This (and the message before) is to “another anom” above”. One more thing: “Working on a small sample” in your way would also mean that all preprocessing decisions (transformations, missing data handling etc.) need to be made based on the small sample alone, otherwise this would invalidate p-values on the test data.

  14. Dear PhD student,

    I would like to suggest to read the recently published paper “More than 75 percent decline over 27 years in total flying insect biomass in protected areas” at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0185809 and in particular the responses of the authors, eg at http://journals.plos.org/plosone/article/comment?id=10.1371%2Fannotation%2F50f95392-7b52-4d84-b08c-f94c65745bdd (in 3 languages!) where they explain why using a simple regression will give a wrong estimate.

    I also recommend this paper because it seems to me that there are similarities between this paper and your study project: (1) a large and a complex dataset, (2) no clear research question (and the paper is thus an explorative analysis, this is towards my opinion also clearly visible when one reads the paper), (3) the original goal to start collecting the dataset (‘The standardized protocol of collection has been originally designed with the idea of integrating quantitative aspects of insects in the status assessment of the protected areas, and to construct a long-term archive in order to preserve (identified and not-identified) specimens of local diversity for future studies.’) is only loosely connected to the main ideas about the causes of the reported decline.

  15. I am the current (2017-2019) chair of the ASA’s Committee on Professional Ethics, and although I’ve discussed this case with the Committee at our last meeting (via conference call), I am commenting as the Committee chair and for myself, and not necessarily for the Committee, and not for the ASA. The ASA Ethical Guidelines for Statistical Practice (https://www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for-Statistical-Practice.aspx) speak to this case in several different ways. The student obviously has a decision to make – a series of them, actually – and “The American Statistical Association’s Ethical Guidelines for Statistical Practice are intended to help statistics practitioners make decisions ethically.” It is stated in the Preamble that these Guidelines are intended for *all statistics practitioners*, and not just those who are members of the ASA or who self-identify as “statisticians”. Not only are they intended for all practitioners, the Guidelines are relevant for all practitioners, e.g., “(g)ood statistical practice is fundamentally based on transparent assumptions, reproducible results, and valid interpretations.” -that is true whether or not the practitioner is an expert or a beginner, and whether or not they are an ASA member.

    The Guidelines offer specific support on this student’s decision from different perspectives, the three clearest being these:

    1. Professional Integrity and Accountability
    “The ethical statistician uses methodology and data that are relevant and appropriate, without favoritism or prejudice, and in a manner intended to produce valid, interpretable, and reproducible results.”

    The professors who asked this student to execute are clearly not interested in “valid, interpretable, and reproducible results”, since they have stated that their intention is to be published in a high impact journal, rather than to generate valid and reproducible results.

    2. Integrity of Data and Methods
    “The ethical statistician is candid about any known or suspected limitations, defects, or biases in the data that may impact the integrity or reliability of the statistical analysis.”

    This student has clearly identified a limitation in this data: there was no hypothesis (or none communicated to the analyst), and that has very clear implications for the reliability of the statistical analyses. There is literally only one reason why this student would not run every analysis they can think of, and share that one that meets the sole criterion given: “statistical significance”. That reason is, it’s a huge time sink with no clear way for this student to know if there will be any meaningful “payoff” (benefit to the student beyond practice with analysis methods). This situation/request itself confers bias that impacts the reliability of the results. However, the power differential may limit this student’s abilities to “be candid” about the situation with these faculty.

    3. Responsibilities of Employers, Including Organizations, Individuals, Attorneys, or Other Clients Employing Statistical Practitioners
    “Those employing any person to analyze data are implicitly relying on the profession’s reputation for objectivity. However, this creates an obligation on the part of the employer to understand and respect statisticians’ obligation of objectivity.”

    One way for this student to broach the problem is to introduce the faculty requestors with the ASA Ethical Guidelines – because there is no evidence that they perceive that this student may be interested in following ethical guidelines at all, much less those for statistical practice. Moreover, this Guideline Principle states that, “(t)hose employing statisticians are expected to …Recognize that the results of valid statistical studies cannot be guaranteed to conform to the expectations or desires of those commissioning the study or the statistical practitioner(s)”.

    These three Principles are most obviously relevant in this case. However, there are other Ethical Guidelines Principles that can also bear on this situation:

    4. Responsibilities to Research Team Colleagues
    Which points out that “The ethical statistician:
    1. Ensures that all discussion and reporting of statistical design and analysis is consistent with these Guidelines.
    2. Avoids compromising scientific validity for expediency.
    3. Strives to promote transparency in design, execution, and reporting or presenting of all analyses.”

    And
    5. Responsibilities to Science/Public/Funder/Client
    The ethical statistician supports valid inferences, transparency, and good science in general, keeping the interests of the public, funder, client, or customer in mind (as well as professional colleagues, patients, the public, and the scientific community).

    When applying this Ethical Guideline Principle, the student can demonstrate that they recognize that good science is not furthered (and is actually undermined) by data dredging, particularly if the main objective is a high-impact journal publication rather than a meaningful contribution to the ongoing scientific discussion on a specific topic.

    All of these (as well as the other 3) Ethical Guidelines Principles can be brought to bear on the decisions that are possible in this case:

    1) The student can notify the faculty that they cannot comply with the request, justifying this decision with any or all of these Guidelines Principles.
    2) The student can create and execute an analysis plan that conforms tot these Guidelines, deliver the results in publication-ready format, and hope (or request) that the faculty agree with whatever authorship or credit arrangement the student suggests.
    3) The student can create an analysis plan that conforms to the Guidelines, and present that to the faculty requestors, noting that their agreement to the pre-planned analyses, together with negotiated/agreed-upon credit for this student in whatever publication results.
    a) In so doing (i.e., following decision b), the student would do well to follow the authorship criteria laid out by the International Committee of Medical Journal Editors (http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html). Although motivated by the issues surrounding authorship experienced by Medical Journal Editors, these four criteria are universal in scholarship.

    Whichever decision this student takes, they can also take the opportunity to familiarize themselves (and their faculty mentors – and peers) with the ASA Ethical Guidelines for Statistical Practice. As noted in the preamble, “throughout these Guidelines, the term “statistician” includes all practitioners of statistics and quantitative sciences, regardless of job title or field of degree, comprising statisticians at all levels of the profession and members of other professions who utilize and report statistical analyses and their implications.” That is, the Guidelines were recently (2016) revised to support the greater (and increasing) engagement in and with statistical analysis that characterizes modern work with data across disciplines. Moreover, all of the parties in this case should be (re)made aware that “in all cases, stakeholders have an obligation to act in good faith, to act in a manner that is consistent with these Guidelines, and to encourage others to do the same. Above all, professionalism in statistical practice presumes the goal of advancing knowledge while avoiding harm; using statistics in pursuit of unethical ends is inherently unethical.”

  16. “Above all, professionalism in statistical practice presumes goal of advancing knowledge while avoiding harm; using statistics in pursuit of unethical ends is inherently unethical.”

    What about “advancing knowledge” “in pursuit of unethical ends”?

    What does it even mean ?

  17. Dear Andrew, you posted almost one year ago so that I may be too late for the party. But you still might be a Ph.D. student, like me, or this may help others in the same predicament.
    In no ambiguous terms: p-hacking is fraud. Don’t do it.
    As a cautionary tale, search Brian Wansink and take a look at this blog post that started everything (http://bit.ly/2PVXYiQ). Interesting enough, it is about a ‘grad student who never said no.’ I am sure the prospects for this grad student are not very rosy now.

    It is essential, however, to know the difference between p-hacking and exploratory analysis. Before creating hypotheses, we observe. When dealing with data, this often includes testing several variables looking for associations. The main difference is that you have to be clear of the exploratory nature of your analyses in your report. You can read more at http://bit.ly/2PYenDo.

    In short, you could take a look at several variables, not with the intent of finding something publishable but looking for potential associations to be further investigated in future studies using other data. You then have to be sure to report that!

    If you do believe that your professors would suppress the reporting of the exploratory nature of the study, you could apply a correction to multiple testing. The easiest and more conservative one would be to divide the p-value of 0.05 by the number of variables you test. So, if you test 10 variables, a significant result would require a p of 0.005. This correction makes finding substantial effects much harder, but your results would be much more convincing.

    Hope that helps!

Leave a Reply to Alex A Cancel reply

Your email address will not be published. Required fields are marked *