Clarke’s Law: Any sufficiently crappy research is indistinguishable from fraud

The originals:

Clarke’s first law: When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

Clarke’s second law: The only way of discovering the limits of the possible is to venture a little way past them into the impossible.

Clarke’s third law: Any sufficiently advanced technology is indistinguishable from magic.

My updates:

1. When a distinguished but elderly scientist states that “You have no choice but to accept that the major conclusions of these studies are true,” don’t believe him.

2. The only way of discovering the limits of the reasonable is to venture a little way past them into the unreasonable.

3. Any sufficiently crappy research is indistinguishable from fraud.

It’s the third law that particularly interests me.

On this blog and elsewhere, we sometimes get into disputes about untrustworthy research, things like power pose or embodied cognition which didn’t successfully replicate, or things like ovulation and voting or beauty and sex ratio which never had a chance to be uncovered with the experiments in question (that kangaroo thing), or things like the Daryl Bem ESP studied that just about nobody believed in the first place, and then when people looked carefully it turned out the analysis was a ganglion of forking paths. Things like himmicanes and hurricanes, where we’re like, Who knows? Anything could happen? But the evidence presented to us is pretty much empty. Or that air pollution in China where everyone’s like, Sure, we believe it, but, again, if you believe it, it can’t be from the non-evidence in that paper.

What all these papers have in common is that they make serious statistical errors. That’s ok, statistics is hard. As I recently wrote, there’s no reason we should trust a statistical analysis, just because it appears in a peer-reviewed journal. Remember, the problem with peer review is with the “peers”: lots of them don’t know statistics either, and lots of them are motivated to believe certain sorts of claims (such as “embodied cognition” or whatever) and don’t care so much about the quality of the evidence.

And now to Clarke’s updated third law. Some of the work in these sorts of papers is so bad that I’m starting to think the authors are making certain mistakes on purpose. That is, they’re not just ignorantly walking down forking paths, picking up shiny statistically significant comparisons, and running with them. No, they’re actively torturing the data, going through hypothesis after hypothesis until they find the magic “p less than .05,” they’re strategically keeping quiet about alternative analyses that didn’t work, they’re selecting out inconvenient data points on purpose, knowingly categorizing variables to keep good cases on one side of the line and bad ones on the other, screwing around with rounding in order to get p-values from just over .05 to just under . . . all these things. In short, they’re cheating.

Even when they’re cheating, I have no doubt that they are doing so for what they perceive to be a higher cause.

Are they cheating or are they just really really incompetent? When Satoshi Kanazawa publishes yet another paper on sex ratios with sample sizes too small to possibly learn anything, even after Weakliem and I published our long article explaining how his analytical strategy could never work, was he cheating—knowingly using a bad method that would allow him to get statistical significance, and thus another publication, from noise? Or was he softly cheating, by purposely not looking into the possibility that his method might be wrong, just looking away so he could continue to use the method? Or was he just incompetent, try to do the scientific equivalent of repairing a watch using garden shears? Same with Daryl Bem and all the rest. I’m not accusing any of them of fraud! Who knows what was going through their mind when they were doing what they were doing.

Anyway, my point is . . . it doesn’t matter. Clarke’s Law! Any sufficiently crappy research is indistinguishable from fraud.

97 thoughts on “Clarke’s Law: Any sufficiently crappy research is indistinguishable from fraud

  1. I’ve been involved with several labs practicing varying levels of statistical hygiene, and I have absolutely known researchers who run numerous hypotheses in quick succession until landing on one that hits p<.05, then publish. So allow me to confirm your suspicion. Each of those tested models will be honestly theory-driven, but theory is cheap and only the p<.05 result is reported on, so who cares. I think it's a combination of not fully grasping what repeated testing does to p-value interpretation and having a priori belief in the hypotheses being tested due to personal experience or unconscious bias or whatever… I like your Clarke's Law because it wastes less time worrying about what the thought process is and gets to the point. In the case of the above researcher, I think intentions were good; with others I imagine they're more driven by grant-chasing or tenure or whatever. Either way the goal is the same: reign that in / educate / etc…

  2. And yet for fraud in academia we have reasonably stringent penalties, at least when proven. On the other hand, for crappy research, nothing, no penalty at all.

    Plagiarize or fabricate data and you might lose your job or a few grants. Cherry-pick data or drop regression points, no one cares.

    • This is fraud, though.

      I think that punishing people for screwing up is bad because it discourages people from being open about their research for fear of someone noticing something they messed up on.

      However, p-hacking should just be viewed as deliberate fraud.

  3. I performed my undergraduate “research” in embodied cognition (n=12 btw, my study was titled ‘The Effect of Mundane Task Repetition on Interpersonal Levels of Trust’ – shocker, no one signed up).

    Embodiment at one level seems plausible when the effects are most trivial (i.e. cognitive psychology) but loses theoretical power as we tend towards the more abstract (i.e. social psychology). Embodiment becomes indiscernible from the ‘alternative’ hypothesis of amodal thought in social psych experiments. For example, the study of bitter tasting beverages influencing political attitudes? It may be: 1) noise, 2) one ‘remembrance’ in a symbolic amodal thought chain, or 3) the additive effect of perceptual symbol systems. It’s possible there’s a momentary minor effect of drinking sugar water on almost anything psychological, but I’m not sure I care.

    So I guess I’m not sure that bad stats make all the difference on the social psych level. It’s bad measurement, bad theory. But then is the solution to evaluate the theory on other grounds besides stats? And what are those grounds? In other words, what is your model for deciding what is bad research? The thing that worries me is that most of the studies like Bem’s are prima facie absurd. But it’s not only the prima facie ones – the absolutely theoretically shallow – that we should be concerned with, right?

    So I guess I’d go with fraud just to provide incentive to limit the epistemological weight of the existing bad literature.

  4. As depicted in film The Big Short, those that realised the financial system was fundamentally flawed and on the verge of collapse wondered whether the system had gotten that way by collusion or by incompetence. I see similarities, in the case of the psychological research system – so to speak. As was probably the case with the financial system, the collapse of psychology is being driven by incompetence rather than collusion. It seems like it’s rigged, that this must be by design, but by and large it’s just honest people failing to comprehend the outcomes of their actions. IMO. Sub prime mortgage lenders, subliminal primes – it’s all the same thing.

    • I dont agree. Could be interaction, could be a sum, but there is definitely some collusion on the financial system and surely some substantial criminal behaviour. The same probably holds true for social psychology. I bet some of these professors KNOW what they are doing very well…

  5. In psychology it is also often the crappy research that attracts the biggest funding. Take Angela Duckworth as an example. She overstated the size of her observed effects thirty-fold in her original article outlining “grit” because she does not understand odds ratios. Partly as a result of that funding agencies got really excited about her work and sent tons of money her way. Fast forward nine years and a meta-analysis points out the original errors, and the general lack of criterion, construct, and discriminant validity of grit. Her response has largely been that she never intended to mislead anyone. Of course, all that the public (and funding agencies) hear is that grit is supposedly the best predictor of success ever discovered.

    • Oh god, really? I am not in that field but I suspect that this is actually pretty darn common. People misunderstand their statistics, are innumerate, and then publish something dramatic (peer reviewed, OMG must be TRUE!) in a field where people are desperate for results, like cancer, or the drug war, or that gun control study from a month or so back, or heart disease, or the role of some simple education program in 3rd world development, or poverty prevention, or whatever, and then these results that are just flat out broken become the reason for decades of funding and research and if anyone points out that it’s all based on broken research it becomes even more funded for a while, because it’s too big (embarrassing) to fail.

      Meantime, careful researchers who identify that a current method of placing an arterial stent is definitely worse than an easy alternative and the easy alternative might extend the lives of 4% of the population by 10 years on average can barely get any funding because of all the money poured into RCTs on taking selenium and how it’s going to revolutionize the progress of heart disease for all, or something (I’m making up the specifics, but in a plausible, mad-libs kind of way).

      I’m pretty sure there’s a reason you constantly read about studies like “consuming dark chocolate reduces the risk of alzheimers by 11%” and it has to do with that sort of thing bringing money in to a field even though it’s all more or less Haruspicy.

  6. Crappy research can be polished up to look like credible research at least as long as it could have possibly been credible (e.g. sufficient sample size, proper random assignment/selection, etc.)

    Just read the guidelines on how studies should be designed, carried out and reported on then put on rose colored glasses when writing about what can get away with saying you did. :-(

  7. Let’s be honest — the entire debate about problems with psychology is probably non-consequential as it’s quite clear by now that psychologists are incapable of actually weighing arguments and prioritizing problems. Until funding gets cut off by the funding agencies dissatisfied with how their money is wasted nothing will change. I hate the “it’s the incentives not the people” argument, because if anyone should be a qualified and ethical researcher, it’s psychologists, but clearly the vast majority of the community does not comprehend what science is about. Yes, sure, there are problems in other fields, but at least the motivation of “big pharma” is easy to grasp. I get it, psychologists have a mortgage to pay, but seriously — what happened to basic personal responsibility? If people aren’t embarrassed by their own research and are actually capable of defending it in public, the problem must lie in something more rudimentary. If you run a study about color preference among ovulating women, then not replicate the results and then introduce weather as a moderator (Jesus, literally the weather!) based on no acceptable argument… Hell, not only that, but actually provide arguments which run contrary to pretty well established theories such as that of concealed ovulation… and then argue for an evolutionary perspective… and if that’s not enough, you actually argue against any concerns raised on account of statistics (and this is not even addressing the elephant in the room: theory!)… This is not denial, self-interest, naivete, this is just bonkers.
    I know, this is unfair to provide specific examples, but this particular case came to mind as some time ago there was a discussion on one of the psych methods groups about pre-registration/outcome switching/harking, and one of the authors actually advocated these practices and provided their papers as proof of okayness. There’s another discussion now in that group concerning behavioral genetics and intelligence. Suddenly the same crowd is all about how skeptical we should be of the findings, methods, determinism, banality, wrongness [insert other adjectives here] of this kind of research. People are actually offended by findings provided by research done on thousands, often hundreds of thousand people and are questioning intelligence’s role as a factor important for life outcomes. Where were they when priming studies were being discussed outside their group?!

    • The problems exist outside psych. Although psych seems bad, others may be even equally bad. Big swaths of medicine and biology seem to be succumbing. Engineering has succumbed to many people studying inconsequential areas of research on a pure theoretical level with largely evidence free methods. People build whole careers out of showing that under certain conditions, simulations show that certain stuff happens pretty much without ever testing these things against reality. In many cases it’s hard to see how you even could test them against reality (think predictions of how the whole US power grid would behave under some particular regulation for solar power generation, or predictions of what would happen to global warming if we started seeding the atmosphere with powdered mica crystals … etc)

      • Yes, I get this, but if we assume that fraud and bad science in e.g. medicine and biology is worse because it matters… then the next question should be why psychologists are doing something that doesn’t? What I find most frustrating is that instead of discussing papers on means for reducing mental illness or improving outcomes in education, you know, stuff that matters, academics capable of conducting high-quality (perhaps even relatively high-impact!) research are drawn into discussing studies of no value. They do it, because it’s such terrible science that they feel morally obliged to take a stance, even if their attention would be required elsewhere. This is a conundrum that James Coyne seems to have captured in his latest paper. The reproducibility movement is great, but let’s be honest: most of these studies are inconsequential or a waste of time and money. Of course we shouldn’t let is slide, but this is exactly the thing that really gets my goat. To be fair, I honestly think everyone’s attention focuses on psychology because it’s something that everyone is interested in (or at least is capable of holding some opinion about, even if it’s wrong); but let’s not forget that this is also a result of the fact that claims made in those papers are so–let’s say “cute”–that it’s not hard for natural skeptics to notice their wrongness instantly. I’d much rather people with skills and common sense had fights about mammography, SSRIs, etc. This is no longer fringe-science, this is cringe-science. So if you’re saying that people in engineering or physics mess up too–we should be angry about that, sure! But somehow it would be easier to get angry at shoddy science in medicine if there were fewer distractors like priming studies claiming that people exposed to magnets are more attracted to their partner http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155943

        • “What I find most frustrating is that instead of discussing papers on means for reducing mental illness or improving outcomes in education, you know, stuff that matters, academics capable of conducting high-quality (perhaps even relatively high-impact!) research are drawn into discussing studies of no value”

          Doing real consequential science takes years of work and pays off only sporadically, therefore it doesn’t get funding in the presence of a crapload of people producing a steady stream of “AMAZING Discoveries (p < 0.05)", and then once you have a big culture of that, they find real research with potential serious consequences threatening and silently denigrate it in the grant study sections.

          It's not so much that people choose this route, it's kind of just that they're halfway up a mountain sliding down, and they tend to go down the path of least resistance.

      • Biology has always had issues. When I went to Vanderbilt, in 2006/2007 I did some graduate-level course work. We made fun of lots of terrible papers. I assumed at the time that this was something that was well known; my professors certainly all seemed to know and show skepticism towards papers.

        That said, any actual significant medical result will likely be replicated because someone will try to make a drug or treatment out of it, so in some ways it is less bad (though it is wasteful of resources).

    • I agree with your first premise (that ultimately the funding agencies decide on what kind of research gets done), but I think you get a little confused when you accuse people of being inconsistent (or bonkers!). All of the things you’re unhappy with do happen, but there aren’t very many people who simultaneously do and believe all of the things you’re complaining about.

      In case it helps: I used to be pretty angry about this sort of thing, until I convinced myself to view scientific papers as progress reports on what you’ve been up to – the same sort of report you might have to fill if you were a software engineer at a big company. It seems sensible to me that universities and funding bodies want lots of papers (you have to say what you’ve been up to), and why most papers are basically wrong (most things you try will fail). From this viewpoint, I don’t think there’s anything ethically wrong with writing tons of terrible papers. You tried something, it didn’t work out, and you have to explain to the people paying you what happened. Of course it would be nice if you were allowed to have more honest abstracts (“I tried this for 4 months, but it basically doesn’t work”).

      This story of papers-as-progress-report is obviously unhelpful when it comes to people like Kanazawa, who seem to be actively deceptive. But in my experience, this is a pretty small minority of practicing scientists. Most of us are aware that most of our papers are, basically, not very conclusive; most of us are pretty annoyed at having to pretend that they are conclusive.

      • > most of our papers are, basically, not very conclusive; most of us are pretty annoyed at having to pretend that they are conclusive.
        Agree and that scientific papers should be written as progress reports was argued for in the last chapter of Modern epidemiology. Kenneth J. Rothman, Sander Greenland, Timothy L. Lash entitled meta-analysis – the chapter that should be read first, which was put last and read least ;-)

        But Daniel nailed the real problem with – [most academic researchers] are halfway up a mountain sliding down, and they tend to go down the path of least resistance.

        • Thanks for the reference!

          And yes, agreed that this is the real problem. To me it also seems like a hard one, and one of my main reasons for reading this blog is the hope that somebody might provide helpful hints on how to slide a little less quickly. I certainly appreciate your earlier publications on the topic (and was quite sad to have missed a recent talk of yours).

          In any case, the previous was mostly an attempt to explain to “Angry” that many (I hope most) academics are quite aware of these issues, and to give the perspective of somebody who has written some pretty stupid papers (ones that would have remained half-finished tech reports while I was still in industry) without feeling particularly dishonest.

  8. A corollary of ASCRIIFF would be that the usual disclaimer at the end of a post – “I don’t know A, B, and C, but I am sure A,B, and C are likely good people who believe in what they are doing and are not intentionally committing fraud” can be replaced with the updated 3rd law.

  9. > Clarke’s first law: When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

    Freeman Dyson being the exception that proves the rule?

    From “The Starship vs. Spaceship Earth” by Eric Steig and Ray Pierrehumbert:

    “The problem is that Dyson says demonstrably wrong things about global warming, and doesn’t seem to care so long as they support his notion of human destiny… The examples of this are legion. In the essay “Heretical thoughts about science and society” Dyson says that CO2 only acts to make cold places (like the arctic) warmer and doesn’t make hot places hotter, because only cold places are dry enough for CO2 to compete with water vapor opacity. But in jumping to this conclusion, he has neglected to take into account that even in the hot tropics, the air aloft is cold and dry, so CO2 nonetheless exerts a potent warming effect there. Dyson has fallen into the same saturation fallacy that bedeviled Ångström a century earlier. And then there are those carbon-eating trees… He points out that the annual fossil fuel emissions of carbon correspond to a hundredth of an inch of extra biomass per year over half the Earth’s surface, and suggests that it shouldn’t be hard to tweak the biosphere in such a way as to sequester all the fossil fuel carbon we want to in this way. Dyson could well ask himself why we don’t have kilometers-thick layers of organic carbon right now at the surface, resulting from a few billion years of outgassing of volcanic CO2.”

    Indeed.

    Link to Steig and Pierrehumbert’s post = http://www.realclimate.org/index.php/archives/2011/02/the-starship-vs-spaceship-earth/

    Some also http://www.realclimate.org/index.php/archives/2008/05/freeman-dysons-selective-vision/

    PS The “saturation fallacy”: http://www.realclimate.org/index.php/archives/2007/06/a-saturated-gassy-argument/langswitch_lang/en/ and http://www.realclimate.org/index.php/archives/2007/06/a-saturated-gassy-argument-part-ii/

    • Freeman Dyson being the exception that proves the rule?

      Well, of course the saying “the exception that proves the rule” means “the exception that tests the rule”—as in Aberdeen Proving Grounds is a location where the US Army tests munitions.

      Given Dyson’s record of intelligence, humanity, and thorough understanding of physics, if I had to make a quick bet I’d wager on Dyson, not Steig and Pierrehumbert. They are probably pretty smart; but, Feynman thought that Dyson was smart.

      Here’s a quote from Wikipedia:

      You’ll have received an application from Mr Freeman Dyson to come to work with you as a graduate student. I hope that you will accept him. Although he is only 23 he is in my view the best mathematician in England.
      Geoffrey Ingram Taylor in a letter of reference to Hans Bethe

      Bob

        • That xkcd comic is a classic.

          I hadn’t seen that Nobel disease list before. Interestingly, one of the people whose work motivated me to pursue the field I did as a grad student makes the list (Smalley)!

        • Tmz:

          Krugman would be on a list of Nobel prize winners who are political activists, sure. But I don’t know of any examples of Krugman endorsing pseudoscience, which is the topic of that list. I haven’t heard of any articles by Krugman promoting crystal healing or cold fusion or the search for Atlantis or whatever.

      • >Given Dyson’s record of intelligence, humanity, and thorough understanding of physics, if I had to make a quick bet I’d wager on Dyson, not Steig and Pierrehumbert.

        Sure, if I knew nothing of the subject about which he was making his statements and were familiar with his achievements in quantum electrodynamics then I’d probably bet on him too – and I’d lose my money.

        Dyson’s statements re climate science are an example of argument from authority. They’re also an example of how to embarrass yourself by making public pronouncements about a technical subject without having done your homework. That’s a charitable view. Less charitably, one might view his appeals to authority as scientific malpractice.

        • “Dyson’s statements re climate science are an example of argument from authority. They’re also an example of how to embarrass yourself by making public pronouncements about a technical subject without having done your homework. That’s a charitable view. Less charitably, one might view his appeals to authority as scientific malpractice.”

          Well, Dyson was publishing in this area in the 1970s and 1980s. Look up his work in that period on carbon sequestration and modeling nuclear winter. IIRC, he claims to have been a pioneer in numerical modeling of atmospheric processes.

          The lesson taught by the many articles from Psychological Science (“the highest ranked empirical journal in psychology” according to their website) discussed in this blog is that much well-regarded research is questionable. In my opinion, much current climate science contains a number of markers indicating that it is similarly questionable—and I find much of the analysis of impacts of climate change even less credible.

  10. >”screwing around with rounding in order to get p-values from just over .05 to just under”

    Assuming you mean rounding the data values (not p-values), I hadn’t heard of this trick before. Some monte carlo indicates to me that this can be an effective method. I saw rounding get reductions in p of ~0.01 to get significance. Has this been seen in the wild? The other approach, of rounding the p-values themselves in such a way that makes them seem lower, is recommended by the American Medical Association:

    “P values should be expressed to 2 digits to the right of the decimal point (regardless of whether the P value is significant)…When rounding P from 3 digits to 2 digits would result in P appearing nonsignificant, such as P = .046, expressing the P value to 3 places may be preferred”
    http://www.amamanualofstyle.com/staticfiles/files/quizzes/Stylebook%20Quiz%2012%20on%20P%20Values_Answers.pdf

    This asymmetric rounding may be problematic. I couldn’t find any data from the medical literature, but my experience would lead me to believe the behavior when p=0.05 is similar to that in psychology:
    “Our final sample consisted of 236 instances where “p = .05” was reported and of these p-values 94.3 % was interpreted as being significant.”
    http://link.springer.com/article/10.3758%2Fs13428-015-0664-2

    • Anon:

      I think this came up in the discussion of some of the papers by the power pose authors, that one possibility of how they got some p-values below .05 was via creative rounding of intermediate quantities.

    • Well I think that the AMA rule would mean that you can’t take p < .050001 and round it to p<.05 either. I really feel as if you are going to spend your time focused on magic numbers like .05 (is that supposed to be .05 or .05000?) you've already gone down a bad path.

    • Your semi-regular Terrifying Update on the State of Statistical Knowledge among Researchers in the Medical Sciences:

      http://www.amamanualofstyle.com/staticfiles/files/quizzes/Stylebook%20Quiz%2012%20on%20P%20Values_Answers.pdf

      ***

      Directions: Correctly edit the following sentences regarding P values.

      1. All the tests were 2-sided, with α = .05 and P > .05 considered statistically significant.

      ANSWER: All the tests were 2-sided, with α = .05 and P < .05 considered statistically significant

      ***

      And just for fun, from the copyright footer: "For educational use only."

      • Jrc:

        Wow, good catch! Here are a few others from that document:

        “change P > .05 to P < .05 and call it to the author's attention" "P < .0001 should be rounded to P < .001" And, most bizarre of all: "JAMA and the Archives Journals do not use a zero to the left of the decimal point because statistically it is not possible to prove or disprove the null hypothesis completely when only a sample of the population is tested" I still don't understand that one!

        • What a surreal document — it’s like reading about religious taboos, or instructions for sacred rituals, but less poetic.

          My guess for the bizarre “no zero to the left of the decimal point:” (rot13’ed in case anyone wants to make their own guesses first…): Fvapr c inyhrf zhfg or orgjrra 0 naq 1, gur ahzore gb gur yrsg bs gur qrpvzny cbvag “zhfg” or n mreb, naq fb vf abg arprffnel gb jevgr. (Bs pbhefr, qrprag crbcyr fubhyq nyjnlf chg n mreb gb gur yrsg bs n qrpvzny cbvag naljnl!)

        • Two Surrealist found-poems, from the AMA Style Guide Errata* (cut whole, without alteration):

          ***
          1
          ***

          Page 349: In the 5 examples at the top of page 349, the hyphens following “non” should all be en dashes.

          Page 349: In the examples of ranges on the middle of the page that are the exception to the rule (“2002–2004” and “31–92 years”), the en dashes should be hyphens.

          Page 349: In the list of “When Not to Use Hyphens” at the bottom of the page, the prefix multi- should be included in the list, right after mid-.

          Pages 352-353: In the examples of en dashes, en dashes, not hyphens, should follow “Winston-Salem,” “post,” “physician-lawyer,” “sclerosis,” “anti,” “tree,” “acid,” and “non.”

          ***
          2
          ***

          Page 389: In the entry for “case-fatality rate, fatality; morbidity, morbidity rate; mortality, mortality rate,” the words “morbidity, morbidity rate” should be deleted.

          Page 394: In the entry for “fatality,” the words “morbidity, morbidity rate” should be deleted.

          Page 395: The entry for “gold standard” should be changed to the following: “See 20.9, Study Design and Statistics, Glossary of Statistical Terms.”

          Page 398: The entire entry for “morbidity” should be deleted.

          ***

          *http://www.amamanualofstyle.com/staticfiles/files/errata/AMA%20Manual_Errata%207-11-14.pdf

        • This reminds me of the possibly apocryphal (or possibly true) story of a copy-editor’s work on a math paper on the subject of Lie algebras, where the notation [a,b] stands for the Lie product (or Lie bracket) of the elements a and b. The editor used the “rule” that nested parentheses use usual curved parentheses for the inner-most terms, then square brackets for the next outer terms, then curly brackets for “parentheses” outside that. So the correct (in context) expression [[a,[b,c]], d] would be changed to the incorrect (in context) {[a, (b,c)], d}.

        • I was told a story about the first person to publish the generation of 2nd harmonics in light (two photons combine to form a higher energy photon) that the article editor saw a smudge on one of the photographs used in a figure and put some white out over it before sending it as camera-ready copy. Of course, that smudge was the data.

        • It is also APA standard to not show a zero to the left of the decimal for a p-value, or whenever indicating a number which must be between 0 and 1, and which cannot equal zero or one. Under these constraints, the zero to the left of the decimal is redundant, and not showing the zero indicates that the number is so constrained.

          Also, APA does not allow one to make a stronger claim than p < .001, I think because normal human uncertainty (notably the fact that sampling error is rarely the only possible error) makes stronger claims rather silly.

          The APA / American Psychological Association standards are used in most social sciences. (FYI: I often edit to APA standards for work.)

        • The purpose of the zero to the left of the decimal place is to ensure that one doesn’t miss the decimal point, end of story. It’s redundant ON PURPOSE. there’s a BIG difference between 0.233 and 233 (the number you might think if you saw .233 but the decimal point is printed faintly, too small, or kerned too close to the preceding letter so that it looks like part of a serif) and that error is so large that it justifies avoiding it by putting 0.233 since if you miss the decimal place there you’ll read 0233 which is not a valid typesetting of an integer and you will realize your mistake.

          OF COURSE ITS REDUNDANT, ITS REDUNDANT ON PURPOSE, REDUNDANCY IS AN ESSENTIAL PART OF ALL ENGINEERING. KEEP THE MASTERS IN ARTS IN ENGLISH PEOPLE OUT OF THIS THEY’RE GOING TO GET PEOPLE KILLED, LITERALLY.

          Dear Sir or Madame, when reading your table of the breaking stress of cables, it has come to our attention that your entry number three reading 233 Newtons is intended to read 0.233 however, you have typeset this number .233 and in our copy of the table the decimal point is so faint as to look like a fleck of included colored paper pulp. Therefore we believe that your publishing company is liable for the loss of 3 professional film industry stunt doubles insured by our company. You shall receive a summons to appear in court shortly.

          Dear Sir or Madame, when referring to your dosage chart for your antipsychotic drug Goodnessgraciousazine our staff has read the clearly indicated 233 mg dosage and given this dosage to 4 patients appearing in our ER for overdose of the street drug Ecstasy. Unfortunately all 4 patients had their blood pressure drop to zero within 1 minute and we were unable to revive them. It has come to our attention based on the dosing for similar drugs that perhaps you are missing a decimal point in front of this number making the correct does 0.233 mg and we encourage you to fire everyone in your english proofreading department immediately.

        • I prefer the leading 0 as well (and use it despite APA guidelines), but your examples don’t work because the allowable range of the variables in mention is beyond 0 to 1 and so the style guide wouldn’t kill the leading zero.

        • What’s the loss function we’re minimizing here?

          If extra leading zero is not placed in front of numbers whose logical range is 0-1: $0

          If extra leading zero is placed in front of numbers whose logical range is 0-1: $0.00001/page (ink)

          If extra leading zero is placed in front of numbers whose logical range includes regions outside 0-1: $0.00001/page (ink)

          if extra leading zero isn’t placed in front of numbers whose logical range includes regions outside 0-1 and number involves important drug dosing or safe working strength information or the like: $300e6/page (loss of life or property)

          By all means, let’s make a special rule costing $30/hr in copy-editing skills to enforce so that we can save a thousandth of a penny per page on the assumption that the risk of it accidentally being applied to an important problem is less than 1e-12 per page!! For those keeping score, that’s p < 0.001

        • Daniel:

          One could argue, reasonably enough, that if the p-value is embedded in an expression such as “p = .14,” that the zero is not needed because it cannot be read as “p = 14” as this would make no sense. I certainly think it’s the height of style-manual nuttiness for them to insist on not including the initial zero, but I think that in the case of p-values nothing is really lost either way.

        • Dear Tool and Wire Inc,

          The probability of failure at given stress is given in your table 2 column 3 as 0.11 and the decimal point has merged with the base of the 1, at all other locations in your publication you have insisted on removing a leading zero, so by accidentally failing to apply your rule consistently you have misled my engineers into believing that the probability of failure was .011 resulting in a 10 fold underestimation of failure probability. We will be sending a bill for the loss of our Mars Rover due to parachute cord failure shortly.

        • The point is, we have a perfectly easy and consistent rule designed to avoid ambiguity and improve redundancy to avoid accidental effective multiplication by several factors of ten, which is frequently applied, and almost universal across disciples in which people rely on communicating numbers effectively…. and then we have a style manual from a group of people who think it’s critically important to point out that p > .05 is different from p < .05 and that p = 0.0001 should be reported as p < .001 because it’s not actually possible to have “real” p values that are credibly less than that??

          The existence of special rules with no real purpose is itself harmful. Suppose there are 3 really important things that keep people from getting sick in public swimming pools. We need total chlorine low enough according to some scale, free chlorine between 1 and 3 ppm, and pH between 6 and 8. Everything else is gravy.

          Now, suppose we create a 15 point checklist including checking the existence of certain posted signs, checking the height of the lifeguard chair, checking the existence of 3 auxiliary flotation devices, checking the existence of a posted phone number for non-emergencies, …. and put these three critical chemical measurements randomly throughout the checklist…. Given the inevitably limited attention that will be paid to this, is this a smart idea? or will it distract people from much more important things?

          Taking a cue from the underpants gnomes on priorities:

          1) Correctness
          2) ?????
          3) Style

        • Funny thing is it’s actually *Psychologists* who have led engineers towards the basic idea of simplifying things down to what’s really important so that we don’t overload people with trivialities and lead them to ignore what’s really important.

        • Your example does not work because you are not talking about a p-value or any other number similarly constrained.

          It is useful to have a style guide in any publication, because, whichever rule is followed, there is less chance for error and confusion than if one article is written with one rule, and another article is written with a different rule. In this case, if one is used to seeing a p-value written in one way, one is less, not more, likely to be confused by a typo or a broken piece of type, than if one does not know how to expect the number to appear.

  11. They also take advice from other ‘respected’ academics who claim you can do something with statistics which is actually just bad science. The mentor of a friend claimed (and actually believed) that there was no reason he could not simply impute an entire survey into several years of data. He heard at a conference that you can impute anything. At some point you’ve hit the realm of fabrication.

  12. The interesting thing in this situation here is, what is the researcher going to do next once their mistakes have been pointed out? Will they continue with their creative p-value manipulations and post-hoc machinations? What I see most people doing is sticking to their current approaches. Not only is that indistinguishable from fraud, it then becomes fraud, because now agency and volition are in play, not mere incompetence and/or ignorance.

  13. Although one should note that academics are acting rationally, maximizing their gain. This is what businesses have to do to get ahead, make a profit by whatever means necessary. Being an academic is no different from being a businessman, except one earns a lot less and the amounts in question are much smaller than in the real world. Telling an academic to be honest in their work is like telling a businessman that it’s not all about the bottom line. It most definitely is all, or almost all, about the bottom line.

    • It is OK for businesses to make a profit by “whatever means necessary”, dis-honesty or breaking laws? I don’t think I subscribe to your views of how to run a business.

      Telling a businessman to focus on the bottom line does not imply asking him to be dishonest.

      • Well, I didn’t say that is how one *should* run a business. Telling a business to focus on the bottom line doesn’t imply asking them to be dishonest, sure, but it sure ends up often enough that way that they are dishonest or unethical or both. I’m sure there must be totally honest and upright businesses (I was thinking big companies) out there. What’s an example of such a company?

  14. It’s easy to point the finger at academics for bending the rules, being uninformed with regards to proper use of statistics, or doing things that lead to bad outcomes for science. And I agree, that individual incentives for promotion, funding, etc., often lead people astray (or at a minimum, make them less willing to fully question whether a finding is truly a “finding” in the real world).

    Yet, I think the institution is the real problem–in this case, I mean the publication process/system. Given the way things are currently structured, it’s simply not possible to publish null results or present research that didn’t work out as originally planned. Instead, scholars are forced to reformulate a theory/hypothesis to fit their results in order to see their work published. And, incremental or sound (but not necessarily groundbreaking) research is not considered novel enough to warrant publication in top-tier journals.

    If we truly believe in science, then we need to structure the rules of the game to ensure that the players are rewarded for positive behaviors and punished for (or at least not encouraged to adopt) negative ones. Like any game, you’ll always have some individuals who search for shortcuts or cheat to win. But, I think we should be more concerned about the bulk of the academy, who are really trying to investigate things that they believe are important and honestly do want to get the story right. Restructure the system to favor sound measurement and research design over statistical significance and counter-intuitive findings. Then we might see behaviors start to change for the better.

  15. “Restructure the system to favor sound measurement and research design over statistical significance and counter-intuitive findings. Then we might see behaviors start to change for the better.”

    When reading several papers and discussions like here, there seems to be some sort of consensus that the system and/or incentives need to change. Regardless of whether this is the best solution to current problems, at this point in time i would be very interested in what/how we should change the system/incentives then, and if anything has changed over the last 5 years or so.

    From what i understand, some changes seem to be happening: more replication studies seem to be performed and published, some researchers pre-register their studies, some journals require some disclosure concerning methodological details of the studies, and some journals offer new publication formats like Registered Reports.

    These can all be considered improvements, but i think 1) these things still happen too infrequently to have an impact, and 2) i wonder if these solutions really tackle the problem of the system/incentives enough to make a difference. It seems to me that these things do (almost) nothing to prevent bad research from keep getting published.

    I think they there should be (a discussion about) new, universal higher standards for all journals to consider publication, but it seems that researchers themselves don’t want this for some reason. I wonder why that is and/or what is wrong with this idea. I also think that without implementing higher standards, nothing will change, as indicated by the low power of the typical psychology study which has not improved over 50 years or so despite being seen as highly problematic for the field.

    To me the discussion about the focus on incentives rather then implementing higher standards makes it harder than necessary to improve matters. It’s like kindly asking people to do A, but still allowing people to do B without any real consequences. Why not simple make doing B impossible, so people automatically do A? I don’t get it.

    • The incentives are what drives bad research. So, if you want good research you need good incentives.

      Here are some examples of incentives structures which might push research into a “good” direction. Note that it’s not trivial to define “bad research practices” so we’d need to work on that.

      1) Any research which is published and then later shown to fail to replicate due to poor research practices creates financial liability for the PI. So, the PI must personally re-pay the research money plus penalties to the group that discovers the fraud/error/bad-research-practices. This produces financial incentives to search for bad research, as well as financial dis-incentives to carry it out in the first place.

      2) Random Auditing: select research publications at random to have designated government audit groups try to re-analyze or reproduce the research. If they discover bad research practices the PI must repay as in above.

      3) Create a metric for research quality and productivity, something WAY better than citation metric of journals. For example, a metric that incorporates how often a paper is cited in a positive way over the 50 years post-publication (ie. as evidence for something, not cited as an example of something done wrong) together with a metric for how important the topic is in terms of number of people who are affected by this area of research, together with a measure of how important the article is in terms of providing definitive new theories or evidence. Then, run this algorithm across all the papers published by each PI on a grant, and include this measure in a score that affects grant funding. People who consistently publish bad evidence poorly analyzed would get reduced scores. Let the PI actually publish a claimed “importance” or “measure of evidence” of the idea. Don’t penalize them when they don’t make strong claims (this lets you publish speculative ideas without being penalized).

      etc etc. Unless treated as a game-theoretic problem in which personal harm comes to cheaters and more resources go to honest, careful, conscientious researchers, we will continue along the path we have, which is that honesty and conscientiousness are penalized because they reduce the rate at which publications are made and they ultimately reduce the funding level available for the PI, and huge quantities of resources are wasted on decades of bad science.

      • The problem with 1 and some extent 2 is that excellent research is designed to fail a given percentage of times and arguing that but for poor research practices it would not have failed is the vexed problem of the identification of causes of effects.

        The problem with 3 is that it appears that ‘quality’ (whatever leads to more valid results) is of fairly high dimension and possibly non-additive and nonlinear, and that quality dimensions are highly application-specific and hard to measure from published and even documented information.

        My guess is that non-judgmental descriptive random audits will do enough of the heavy lifting if there is anyway way of legally enabling them and or mandating others (funding agencies) to do them.

        • I fully agree with you on the practical problems of defining good quality and the issues of getting people to agree to having such types of rules put in place. The examples were more for illustration than actual practical implementation.

          I personally think that excellent research should never fail, but I probably define fail in a different way. If you do excellent research you should come to a situation where the range of possibilities has been narrowed but not more than is justified by the data you were able to collect. In general excellent research will to the best of our abilities narrow the possibilities as much as possible for any given level of funding. Obviously this sounds like an optimization problem, but in practice we need to “satisfice”.

          Still, I think something like this study on acupuncture and allergies that I blogged a week or two ago fails on most counts. In particular, our goal is to find out how much better acupuncture will make us feel, and the result was we (at best!) found out whether or not acupuncture is better than sham acupuncture at producing survey scores.

        • The original “bad research practice” is just asking the wrong question when you could have thought a little harder and asked the right question. But, good luck objectively defining “asking the wrong question”. Still, I think this gets at “Angry”s point above, why are we doing so much worthless meaningless science when we have a pool of serious problems that need attention? Let’s at least focus on the right kinds of questions!

  16. A story about failed mathematics I posted at http://academia.stackexchange.com/questions/30995/what-to-do-when-you-spend-several-months-working-on-an-idea-that-fails-in-a-mast/31082#31082 – no need to click through since this is what I said:

    Years ago Marguerite Lehr, a colleague of mine at Bryn Mawr college, told me of a conversation she’d had years before that with Oscar Zariski, a brilliant algebraic geometer then at Johns Hopkins. She told him about a failed attempt to solve a particular problem. He said “you must publish this.” She asked why, since it had failed. He replied that it was a natural way to attack the problem and people should know that it wouldn’t work.

    • Related but not quite the same: The importance to mathematics of publishing counterexamples. i.e., if someone is trying to prove a conjecture, but comes up with an example where the conjecture is not true, then publishing that counterexample is important, since the counterexample may help in understanding what is going on, leading to better conjectures and often ultimately conjectures that are proved.

      • Yes, I think mathematics is different. Less pressure for funding, harder to publish real junk. And yes, counterexamples are real results. In this case, though, the effort just failed, but (I think) the reason for failure wasn’t a useful counterexample.

        • > mathematics is different
          Agree, but I think most of that is due to hyper-efficient replication that leads to sufficient replication of results being claimed if not before then very soon after publication. As long as some reviewers or readers have sufficient background knowledge and adequate time to work through all the derivations and proofs. Still some mistakes in math do get published and take a while to be noticed.

          Likely what could be expected in any empirical science if adequate resources were brought to bear on verifying claims do replicate.

        • Math also has longstanding practice of making preprints available, which increases the chances of finding errors before publication. It seemed so strange to me to find that other fields didn’t have this tradition.

    • > He said “you must publish this.” …

      It would be great if there were a established mechanism for doing so. My experience is that if details of failed attempts exist they do as tribal knowledge, i.e., if you share your tried-but-failed-because-of-XYZ experience with colleagues over lunch one of them might pipe up to the effect, “Oh, yeah, I did the same thing years ago.” If you’re really lucky they might dig up some obscure technical report which had been sitting in that back of their file cabinet for the past decade and you can compare notes. Not that it does any good at that point, except for maybe providing some consolation that you weren’t the only one who thought your proposed solution had legs. A “Journal of Lesson Learned” would be useful.

      • Maybe there is some civic-minded soul out there who will start an online Journal of Lesson Learned. Or I wonder if that might be a topic for a community in Stack Exchange — e.g., someone could ask, “Has anyone tried this?” and hope to get at least some suggestions of the sort “I think so-and-so might have tried that;try contacting them.” I suppose this would need to be discipline-specific, though, so perhaps not a good idea. Maybe something closer to PubPeer might work, though.

        • The general strategy is to write a “don’t make the mistakes we made in this project by being aware of these things and considering these possible options” piece.

          An example early in my career was http://www.ncbi.nlm.nih.gov/pubmed/3259982
          (Having used active or available clinic patients as a source of subjects for a clinical study we discovered we could not make sense of the results. The first author presented the paper all around NA for the first couple of years and said at every presentation someone owned up to making the same mistake or were about to make the same mistake – bet someone else could do the same today and half of the places they give the talk someone will own up…)

        • “Don’t make the mistakes we made in this project by being aware of these things and considering these possible options” = “Lessons learned”.

          Most projects I’ve been involved with, successful or not so, have had a lessons learned session at their conclusion. The sessions are generally very helpful but the results generally don’t get published. That’s great that you did.

  17. “instead of discussing papers on means for reducing mental illness or improving outcomes in education, you know, stuff that matters, academics… are drawn into discussing studies of no value.”

    To be sure, if grit/conscientiousness could be taught, then all the attention would be merited, as would attempts to propagandize the value of grit to schoolchildren.

    • On second thought I see how my examples may be taken as stuff that got psychology in trouble in the first place (i.e. mindset, grit, stereotype threat, implicit bias in the priming sense etc.); what I meant to say is that there’s still some stuff to explore in the realm of memory, information retention and recall. I personally think that most of the things we’d like to improve (i.e. thinking skills–be they intelligence, critical thinking or basic stats and maths skills, not to mention language comprehension) are in general set by the time people get to higher levels of education (or are more likely than anything else we can come up with) and unless the students themselves are willing to take some intellectual beating they will benefit very little from participating in the system (outside networking which will potentially prove useful for their future careers). Fine, I have no problem with this. But a reasonable thing to do would be to start discriminating those who can from those who can’t. Otherwise, we’ve reached a point (or have been here for ages!) where neither can we improve individuals nor the system. And again, I get that perhaps psychology is unfairly pointed out as a rotten apple and is a convenient beating boy, but I think that to some degree we should admit that having unreasonable ideas in stem biology is qualitatively different from having unreasonable ideas in psychology.

      • You appear to be conflating bad method and faulty inference with the ideas tested by those methods. It is important not to throw the baby out with the bath water.

  18. Focus on real effects, and ditch P<.05 as a significance criterion; what are you, an undergraduate upset your thesis didn't work out? Calculate a sample size needed for a meaningful result/ accepting of the null hypothesis. Build replication into your designs. Stop believing in research superstars who become as detached from truth and reality as top politicians. Stop grant-grubbing. Most of all, stand on the shoulders of giants, not recent gimmicks.

  19. I think that your take on the third law, is clever, but I don’t like the attribution to Arthur C Clarke in the title. But then, I am just an old guy, with a lifelong love of science fiction, Arthur C Clarke was a childhood, then later an adult hero.

Leave a Reply to Phil from Accounting Cancel reply

Your email address will not be published. Required fields are marked *