“Most notably, the vast majority of Americans support criminalizing data fraud, and many also believe the offense deserves a sentence of incarceration.”

Justin Pickett sends along this paper he wrote with Sean Roche:

Data fraud and selective reporting both present serious threats to the credibility of science. However, there remains considerable disagreement among scientists about how best to sanction data fraud, and about the ethicality of selective reporting.

OK, let’s move away from asking scientists. Let’s ask the general public:

The public is arguably the largest stakeholder in the reproducibility of science; research is primarily paid for with public funds, and flawed science threatens the public’s welfare. Members of the public are able to make rapid but meaningful judgments about the morality of different behaviors using moral intuitions.

Pickett and Roche did a couple surveys:

We conducted two studies—a survey experiment with a nationwide convenience sample (N = 821), and a follow-up survey with a representative sample of US adults (N = 964)—to explore public judgments about the morality of data fraud and selective reporting in science.

What did they find?

The public overwhelming judges both data fraud and selective reporting as morally wrong, and supports a range of serious sanctions for these behaviors. Most notably, the vast majority of Americans support criminalizing data fraud, and many also believe the offense deserves a sentence of incarceration.

We know from other surveys that people generally feel that, if there’s something they don’t like, that it should be illegal. And are pretty willing to throw wrongdoers into prison. So, in that general sense, this isn’t so surprising. Still interesting to see it in this particular case.

As Evelyn Beatrice Hall never said, I disapprove of your questionable research practices, but I will defend to the death your right to publish their fruits in PPNAS and have them featured on NPR.

P.S. Just to be clear on this, I’m just reporting on an article that someone sent me. I don’t think people should be sent to prison for data fraud and selective reporting. Not unless they also commit real crimes that are serious.

P.P.S. Best comment comes from Shravan and AJG:

Ask the respondent what you think the consequences should be if
– you commit data fraud
– your coauthor commits data fraud
– your biggest rival commits data fraud
Then average these responses.

21 thoughts on ““Most notably, the vast majority of Americans support criminalizing data fraud, and many also believe the offense deserves a sentence of incarceration.”

  1. OK, I read the study and it is worthwhile research. BUT I fear that Andrew has succumbed to some mood affiliation in reading it. It is nice to see that the public is clearly on the side of data falsification and selective reporting as ethically bad and behavior that should be punished. I also feel that way and it is gratifying to see such strong evidence of those feelings.

    However, the surveys the authors conducted are unreasonably clear about who the victims are. Talking about a researcher investigating the effectiveness of a drug and only reporting the study design that showed effectiveness (and not the two that did not) makes it very clear that such behavior is “bad” and that we, the public, ate victimized. And, yes, there are such cases with researcher behavior. But the far more common circumstance is one that is more nebulous. For example, what about a researcher who selectively reports the result that a new cancer treatment may be more effective than existing treatments, and does not report two studies that do not show significant improvement? And, what if the context is that the FDA may or may not approve further testing of the drug, depending on the outcome of these statistical tests? I think the context matters, and many cases are more ambiguous is that the researcher behavior may be seen to serve a “higher” purpose.

    Similarly, researcher misbehavior can be seen in studies about gun control and deaths from gun violence. I suspect the public opinion will be far more tentative if the misbehavior is seen to serve the goal of more gun control (or less, depending on one’s beliefs).

    So, I’m not sure how much of a contribution this particular study is. It shows that there is overwhelmingly consistent public condemnation of the practices of data falsification and selective reporting. But I do not find that surprising. It does not show (at least to me) that there is overwhelming public condemnation of the more common forms of falsification and selective reporting that are typical. Absent realistic context that generally contains moral ambiguity, public attitudes are clear. I suspect that things are not as clear under the circumstances that most researchers operate in.

  2. At as gut level this is understandable. At a gut level, I even agree. But I think it is hard to write laws and regulations about this that will be specific enough to pass a “due process” test and not also contain loopholes one can drive a fleet of trucks through.

    For example, it seems pretty clear that making up data that were never observed is fraud. And so is changing the values in the data set from what was observed. Right? So what does that say about procedures like multiple imputation? What is that if not making up data? And you don’t need a whole lot of experience with real world data sets to know that they often contain inconsistent or impossible data that needs to be cleaned. So if 9 of 10 observations on a person give her date of birth as 1/1/1999, and the tenth one says 1/1/1998, most of us would be inclined to replace that 1/1/1998 with the modal value of 1/1/1999 before proceeding to analysis. But that’s data tampering.

    If I have a data set that tells me that a study participant experienced a drug complication 6 months after he died, and if my best efforts to find out what really happened are fruitless, I will generally exclude that observation from analysis (or change those variables to missing values–which for most analyses is tantamount to excluding it). Is that selective reporting? Meticulous data collection makes such things relatively uncommon, but data sets are often large enough to contain at least a few uncommon anomalies.

    Evidently one could catalog some standard procedures that would be considered legitimate, at least if they are properly disclosed. But no catalog would ever be complete enough to keep someone from finding a way around it. For example, someone determined to fake data could just intentionally do a really sloppy job of collecting data and then replace the “obviously incorrect observations” with “corrected” values that serve his or her ends. Moreover, any catalog of “safe harbor” procedures, while being porous to those intent on deception, would also stifle statistical innovation.

    Clearly to some degree it is the intent that matters. But that puts the courts in the business of reading minds–something that should be minimized given how dicey a task that is. And, on top of that, I do believe that the bulk of the bad science problem comes not from people with fraudulent intent but from well-meaning people adopting “standard practices” that they don’t understand adequately, and reviewers not knowing much better themselves.

    • I think it’s useful to consider the issue of misrepresentation.

      If you say “I extracted drug treatment data from the Foo hospital database using the query BAR and here it is in csv format….” Then it should be what you say it is. If you later went through and deleted entries, altered entries that you believed had errors, and did all sorts of stuff like that before csv formatting it and publishing it… then at least you’ve committed some kind of fraud in misrepresenting what the data is. It should be possible to at least apply sanctions or recover some damages from you. If you did it with intent to mislead, then there should be a greater punishment. That is standard practice in law. If you carry a concealed weapon it is one kind of crime, and if you carry a concealed weapon with the intent to sneak into your ex-wife’s work and injure her… it is another. I agree that intent becomes a difficult issue to deal with, but it is predictive of the future in some sense. If you carry a concealed weapon with intent to do someone harm, it’s clear that you might do so again, whereas if you accidentally leave a large knife in your backpack after a camping trip and carry it into an airport unknowingly… it is a different matter. Law is not an easy clear-cut thing.

      • If you say “I extracted drug treatment data from the Foo hospital database using the query BAR and here it is in csv format….” Then it should be what you say it is.

        In an ideal world this would be true. First there is the data itself: clinical databases are replete with inconsistent data, out of range values, data attributed to the wrong person, etc. Extensive cleaning is always necessary. I would attach no credibility at all to results derived directly from data extracted from any clinical data base I have ever seen.

        So, now we have to deal with the issue of disclosing how you cleaned the data. Ideally this would be done by publishing your code (and, confidentiality concerns permitting, the data itself) along with your article. I don’t know of any clinical journals that will do this. If you try to explain in reproducible detail what you did in the publication, they will just cut it during editing, calling it “unnecessary detail” for which they “don’t have space” and about which their readers “don’t care.”

        “If you carry a concealed weapon with intent to do someone harm, it’s clear that you might do so again, whereas if you accidentally leave a large knife in your backpack after a camping trip and carry it into an airport unknowingly… it is a different matter. ”

        I would agree with you. But tell it to the TSA; I have a friend who did jail time for precisely this. That’s just one reason of several why I’m very skeptical of having courts practice telepathy.

        • I’m pretty skeptical of telepathy as well. I’m also skeptical of concealed weapons laws. But I do assume that your friend served a lot less time than he would have if they had also found email evidence that he was plotting to take over the plane with his knife and crash it into a building. So, the distinction between carrying a concealed weapon and carrying a concealed weapon with intent to cause harm sort-of worked. Intent is all throughout the law, and if you can prove beyond a reasonable doubt that a person taking money from the government to do research published completely made-up data with intent to make his grant higher-scoring and hence receive more funding… then I think that should be a different “offense” than publishing incorrect data because you “cleaned it up” and then forgot to attach the cleaning-up script to the data repository upload.

  3. well, the main issue presented here is:
    ‘ Jail time for fraudulent science promulgation ? ‘

    As such, it quickly distills to a ‘legal’ issue… and there’s much established law in this area. Americans can generally say whatever they want (free speech), short of slander/libel, gross obscenity, and promotion of violence. Quackery and sloppy science pretensions are commonplace in society and usually do not require government legal intervention.

    Actual fraudulent dealings with other people usually are covered by Contract Law (Civil Law).
    A losing defendant in civil litigation is never jailed… but only reimburses the winning plaintiff for losses caused by the defendant’s behavior.

    For a scientific researcher to legally defraud somebody there must be soma kind of formal or implied contractual relationship between the parties. The financial sponsor of scientific research normally has a contractual right that the research will be done honestly and in good faith. Same with corporate/university/government employer/employee legal relationships, and media outlets that accept research paper submissions.

    If Americans really want jail-time for dishonest behavior in society– they need to build a lot more prisons. The scientific research sector is just small potaters.

    • I agree with your general point but the distinction isn’t that clear, is it: e.g. Insurance fraud is essentially, just “dishonest behavior” isn’t it? Or Social Security fraud.

      Mail fraud is a criminal offense.

      Essentially, you could make a case that all of those are simply dishonest behaviors and contract law or torts ought to suffice?

  4. The current trend in punishment is to reduce long-term incarceration. I should note, however, that most of those who commit data fraud are currently imprisoned for a very long period of time in a penal institution known as academic (many of them are lifers!). This matter deserves urgent reform.

  5. If you commit data fraud on a *government* grant, isn’t it already covered by criminal statutes? Isn’t it *already* criminalized? e.g. 18 U.S. Code § 1031 or 18 U.S. Code § 1001 or 18 U.S. Code § 1002.

    Or does the bit about “support criminalizing data fraud” mean that this ought to be extended to private contracts? Sounds excessive.

    • How about asking what you think the consequences should be if…
      you commit data fraud
      your coauthor commits data fraud
      your biggest rival commits data fraud

      then you should average them together or something, as long as we’re at it.

  6. I have always meant to write this up, but people who study white collar crime are always saying “Americans believe white collar crime is serious.” and they always cite the same famous survey to support that. So I actually went and looked at the survey and it turned out it asked people to rate the seriousness of a lot of different crimes. For the white collar crime the example driving this entire statement was a white collar crime that resulted in the deaths of multiple people. So sure, you could say they believe white collar crime is serious, but you could just as easily say they believe causing the deaths of multiple people is serious. In fact, I would say you could say that a lot more easily.

  7. Everyone else in society from plumbers to accountants to doctors will go to jail if they are bad enough at their jobs. I see no reason scientists should be excepted.

  8. How about averaging the responses if:

    you commit insurance fraud
    your business partner commits insurance fraud
    your biggest competitor commits insurance fraud

    Then average those because it’s no different. Research fraud is no more or less “victimless” in resource acquisition (getting the grant). Where that fraudulent research influences public policy it is arguably worse than insurance fraud.

    Why is there this willingness to give out free passes for it where such willingness is demonstrably absent with insurance fraud?

  9. There is some behavior that I believe amounts to a criminal offense, mostly due to the public funding aspect.

    Anecdote in point: my wife (biology PhD) shared the story with me of how a PI in her department developed a new procedure and published a paper on this. This PI wanted to publish the procedure for the credit, but also wanted to keep one step a head of the competition. Because of this, they intentionally did not describe a key step so that it would be difficult to replicate in competing labs.

    Personally, I found this behavior to be criminal (my wife told me that the researcher very casually mentioned this to her, so clearly he disagrees with me). The reason for this is that a.) he was paid by tax payer money to increase the public knowledge, yet intentionally did not and b.) not only did he misuse his own public funds, but he presumably wasted other labs’ funds, in that they would have to waste money trying to relearn the key steps in the procedure he pretended to share with everyone.

    If this was just some guy being a jerk on his own time, that certainly wouldn’t be illegal. But the misuse of public funds is where it becomes a whole lot more serious to me.

    Of course, in practice, this would be an absolute nightmare to prosecute. How would you differentiate between those who intentionally deceive and those who have difficulty explaining complex scientific concepts?

    • >”Because of this, they intentionally did not describe a key step so that it would be difficult to replicate in competing labs.”

      A functioning scientific community would already take care of problems like this with routine replication attempts. The researcher would not be respected and fail to get further funding.

      These problems were all solved as part of the scientific revolution itself, the whole point was to “take no ones word”. The only reason we are discussing sending people to jail over this stuff is that the vast majority of what is getting funded under the name science is now pseudoscience.

  10. How about… Investigator gets big grant A anticipating trialing treatment and control on 500 patients over 5 years. Then accepts smaller grants B, C, D, E, and F which divert half the patients committed to A. After 5 years, study A has 250 patients, with insufficient power for most planned outcomes (most of which were based on exuberantly optimistic power analyses).

    To me, this feels an awful lot like fraud.

Leave a Reply

Your email address will not be published. Required fields are marked *