Oooh, I hate all talk of false positive, false negative, false discovery, etc.

A correspondent writes:

I think this short post on p value, bayes, and false discovery rate contains some misinterpretations.

My reply: Oooh, I hate all talk of false positive, false negative, false discovery, etc.

I posted this not because I care about someone, somewhere, being “wrong on the internet.” Rather, I just think there’s so much wrong with so much of the scientific discourse on science and statistics and learning from data. It’s beyond Bayes or p-value or anything so specific; I think it gets to deeper issues regarding goals and expectations of certainty.

21 thoughts on “Oooh, I hate all talk of false positive, false negative, false discovery, etc.

    • Am sure there are other problems, but the one that struck me was (a) the useful example of p-values as surprising deviations from an starting hypothesis that coin flips are fair, followed by (b) a subsequent example where the same logic is applied but the underlying true effect is assumed to vary. If the true effect varies, the probability of p<0.05 is going to be greater than 0.05, at least for a two sided test. More generally, it is conflating sampling uncertainty with the prior distribution of the effect size.

      • I don’t understand where is the problem. The basis of his argument is considering that the effect may be one of these two options:
        1) zero. In that case the probability of getting p<0.05 is 5%.
        2) non-zero and fixed at the value that make the power of the study 80%. In that case the probability of getting p<0.05 is 80%.

        One could say that saying that the true effect is either zero or some fixed value is too much a simplification, but I don't think that makes the argument "wrong". It is the simplest example that can be used to illustrate his main point: "that the key to interpreting any "positive" study lies in an assessment of how likely you thought the hypothesis was to be true in the first place."

  1. I actually liked the video – until it came to the part about Bayes Theorem (and all that false positive/false negative stuff). There I agree with Andrew – the real issue is the assumed need to have a binary decision when there is uncertainty that cannot be eliminated. The introduction to probabilistic thinking, and the relevance of priors, was fine I think. If he had then gone on to discuss the dangers imposed by needing to make a binary decision (which is necessary in many cases), I think it would have been much better. The textbook example of Bayes Theorem was an unproductive diversion – in fact, it suggests that more careful use of “evidence” somehow will lead to the “right” decision. I think it would be better to illustrate that the evidence is imperfect and the need to make a decision invariably carries risks. Various quantifications of these risks are possible, but none will eliminate the risk and rarely will the lead to an entirely clear decision rule.

  2. > “I think it gets to deeper issues regarding goals and expectations of certainty.”

    That “deeper” realm is formally known as Epistemology — critical to any fundamental exploration of “certainty”.

    Epistemology is a branch of philosophy that investigates the origin, nature, methods, and limits of human knowledge.

  3. Any practicing physicians around? So you would do a 43% surgery and use a chemotherapy drug dose of 23% of the standard because you Bayesian posterior is 43% and 23%?

    While I agree with your arguments to a large degree — and I am bullied often enough in clinical research for not providing tables of pick-your-own p-values — you are living in an ivory tower where you can get away with “some likelihood that early exposure to guns increase the chance of ending in Alcatraz”.

    Come on, get positive and tell what to do in cases where you must make a decision. Making fun of false positives is not enough.

    • Dieter:

      I’m not quite sure who you are addressing this comment—not to me, surely? I guess you’re talking with someone who “would do a 43% surgery and use a chemotherapy drug dose of 23% of the standard because your Bayesian posterior is 43% and 23%”? That’s not something I’ve ever said. Also you have this: “some likelihood that early exposure to guns increase the chance of ending in Alcatraz.” I have no idea where this quote comes from, but just for the benefit of our readers, let me emphasize that I’ve never said or written this, or anything like it.

      Finally, you say, “Come on, get positive and tell what to do in cases where you must make a decision.” We have a chapter on decision analysis in our BDA book. For more details on two of these examples, take a look at this article and this one. Bayesian decision analysis is straightforward and does not require null hypothesis testing, false positives, false negatives, or anything like that.

      • In addition to null hypothesis testing, false positives, false negatives, etc. – there does seem to be a neglect of the norm Stephan Goodman claimed was now widely accepted “In clinical research, the idea that a small randomized trial could establish therapeutic efficacy was discarded decades ago.”

        Now, reproducibility was discussed but more as a check of single study claims rather than don’t make claims based on a single study except in exceptional circumstances. Why do something that is usually un-necessary when is likely to be misleading?

        The surgical analogy for the video would be – it did remove most of the tumour but left a lot of nasty parts still there.

    • Dieter — I think the point is not that Bayesian > Frequentist, but that any approach to decision-making is half-baked without considering the consequences of the decision. Any fixed threshold (p<0.05, whatever) is ignorant of the benefits and costs of the different options, and so will only be right in the same way that a stopped clock is sometimes right.

      I think that physicians generally are very adept at understanding and weighing the various considerations that arise in a clinical case. Maybe exposure to p-values is itself the problem.

      • “I think that physicians generally are very adept at understanding and weighing the various considerations that arise in a clinical case.”

        That may be true with the qualification, “if they have all the relevant information about the various considerations that arise.” But how often do they have all the relevant information? My impression is that drugs are often approved before enough relevant information is available. (Also bear in mind that once a drug is approved for one use, physicians may use it “off-label” for other conditions for which it are not approved or adequately studied.”

        • In my experience they also don’t take potential side effects seriously (having rarely or never seen them), and so may be inclined to recommend a treatment with a small potential benefit without considering the (small but nonzero) risk.

        • +1
          An example: A friend who had recently been diagnosed with heart failure was having trouble sleeping, so asked his GP about sleeping pills for occasional use. The GP prescribed one; the package insert said “discontinue use if hallucinations occur.” The friend tried it cautiously; it worked well with no side effects. But on the third try — really disturbing side effects! So I then looked the drug up on the web, and found a website on harm reduction for recreational drugs, that said that the drug was popular for “recreational use”, and listed side effects like the one my friend had.
          Moral: Check “recreational drug” sites before taking a prescription med!

        • Some groups are text mining social media to pick up information on side effects of drugs. Hopefully they are not overlooking the “recreational drug” sites.

    • Does trying chemo for a month and doing surgery depending on future tests count as a fractional chemotherapy and surgery? I bet you recommend that sort of thing all the time.

  4. Andrew:
    Quick thought. I often see decision analysis frame as monetary or in terms of lives. However, I think there are many other decision we are making that could benefit from thinking in terms of what you have described in BDA. For example, stating “there is an effect” is a decision, and so is excluding variables from a model. Yet, in each case, the costs or benefits are not really weighted: do we simplifying the model, because there is a cost of collecting these variables in the future, or do we include everything to supply more information for inference.

Leave a Reply to Donny Williams Cancel reply

Your email address will not be published. Required fields are marked *