15 thoughts on “Looking for the bottom line

  1. “If this treatment were performed nationally on X people, we estimate it would cost Y dollars and save Z lives”

    …so how do we say that while recognizing the uncertainty in our estimates?

  2. It’s really unfortunate that this thread hasn’t generated more discussion (though the original was quite interesting). We need to communicate analyses effectively, and clinicians/researchers/policy makers are used to binary thinking.

    Misuse of p-values is largely rooted in miscommunication of what a p-value means. How do we accurately and meaningfully communicate the posterior to clinicians/researchers/policy makers?

    • Ok, I’ll bite. I think one thing we can communicate relatively easily is samples from the posterior distribution. It’s easy to say “all of these are equally reasonable possibilities for what might be going on” and this is much easier for people to understand than something like “the log posterior distribution is this function” or anything like that.

      Spaghetti plots of Hurricane Irma projected tracks were all over the news, people understood intuitively that each one was a possibility and that something else not included also might happen.

      If this were the kind of output taught about in stats 101 etc rather than p values, I think the world would be a much better place, and clinicians/policy makers etc would intuitively do something approximating the right thing, for example “well take this possibility here, what would happen to our (tax revenue, budget, water supply, hospital operations, crime rates, whatever) if this were what happened… how about if it were option B… etc

  3. Thanks,Daniel.
    I love the Irma reference. I was fascinated by those plots in the same way that some people were fascinated by election predictions, even though I am not directly affected by that particular storm.

    Getting to paragraph 3: Suppose we do an RCT and get a posterior for the effect of active drug minus placebo on a meaningful scale (e.g. reduction in blood pressure) that happens to be about normal(-3,sd=3).

    A) We can report 84% probability that on average active reduces blood pressure more than placebo. THIS is the kind of thing that many people, especially manufacturers, want to hear, I think, b/c it starts to fit more naturally into binary thinking “84% is way more big-ish than 50%, so it must be the way to go!”.

    B) We can also report that we are 90% certain that the average benefit of active over placebo is -8 to 2 points, with a best guess around -3. In my experience, this isn’t sufficient information for a clinician/patient/policy maker, etc. to make a decision. They tend to expand such statements to see that plausible values include a worse effect of active, which is not much different from just reporting the posterior probability of an effect less than zero.

    What are the serious alternatives? Utilities? Setting aside the issue of virtually infinite number of currencies on which utilities can be expressed, we still wind up with a posterior distribution of the utilities, and we are no better off than before!

    I’m willing to concede that clinicians/policy makers/manufacturers etc. should not get improper information b/c it happens to be what their busy/lazy minds demand, but what else do we do?

    • In your example, as in many real world examples, the problem is that we aren’t taking our analysis to the bottom line, what is it that people care about? Certainly it’s not the numbers on the BP meter in and of themselves. If my BP were 1300 / 780 mmHg and it had no health consequences I wouldn’t care vs 130 / 78 mmHg

      So, 84% chance that giving this drug reduces the numbers on an automatic BP meter is meaningless. Truly, I mean it could be very consistently reducing someone’s BP from 190 / 100 to 188 / 97 and still this has essentially zero benefit, still the patient is at significant risk of stroke and heart attack etc.

      We may have a lot of currencies, but the QALY at least has both quantity and quality of life included in its calculation. So, let’s start there. What is the expected QALY improvement for a 45 year old male caucasian who takes your example drug vs do nothing? What is the cost in dollars? How much does it cost in dollars to hire this person a personal trainer once a month for a half hour to get them to participate in 20 minutes of cardio exercise 3 times a week? How much benefit does that give?

  4. This is all very sensible to me. Yet your methodology in paragraph 3 defines a probability density on the QALY for the two options. We might conduct an RCT to contrast the options to generate a posterior on the difference in QALY. That just brings us back to our original problem, no? Which option is better on the QALY scale, and how sure are we of that conclusion?

    • Which option is better on the QALY scale is now down to calculating the expected QALY outcome under the various options, and choosing the one that maximizes (minimizes if you’re using a “badness” scale) the expected value of the thing you actually care about. Since QALY at least sort of approximates something we care about, now the Bayesian decision theory result is meaningful.

      • You might ask why use the expected value to make your decision? As far as I can see, the answer is that it’s a quantity that is sensitive both to the outcomes under every possibility, and the probability of that outcome. If a function is going to be sensitive to both of those things, and be in the dimensions of the quantity of interest (say years, or dollars or kilowatt-hours of energy or whatever) then the function needs to add up quantities in those units (because multiplication would change the units, and any other operation is only defined on dimensionless quantities there’s no such thing as e^liters for example). Given these restrictions, the expected value is basically the only quantity that meets the requirements.

      • OK, I guess you might even consider using QALY as your outcome measure (instead of BP or whatever), and proceed with the RCT. But, you still have a posterior distribution for the contrast between treatment arms, just now on the difference in QALY scale. I don’t have a problem with that at all, but other than using a more meaningful outcome, what are we doing differently?

        I’d still say we are X% certain that active drug generates greater improvement in QALY than placebo.

        • You probably can’t use the QALY directly in an RCT because you’d have to run the trial until the death of every person in the trial, and do frequent followups on life quality, to observe the QALYs.

          Let’s say instead that you have a well-calibrated risk model for heart attack and stroke based on BP, age, smoking status, adiposity, physical activity levels, etc. The only thing this RCT does is change the BP.

          Now, given your RCT, and measurements of the individuals such as age, smoking, adiposity, you have a posterior distribution over the change in BP. From this posterior on the change in BP you can calculate a posterior on the change in lifetime QALY. You can also do the same thing for other drugs, and other interventions (exercise, diet changes, etc).

          Now, pick the intervention that maximizes expected QALY gain (or in a more sophisticated analysis perhaps include costs as well).

          being X% certain that a drug improves QALY vs placebo is not a decision rule, choose the action that maximizes expected QALY is.

  5. This is really helpful. My first reaction is to think about Andrew’s call to embrace variation and accept uncertainty, but b/c we can think of the RCT as providing advice on a decision, then all we need is an optimal choice. Thanks!

    • Well, yes we want an optimal choice, but we won’t get it unless we build a model that does a good job with the uncertainty, the optimal choice comes from maximizing the expected goodness, and the expected goodness depends on the full shape of the distribution of possible outcomes, together with the goodness that comes from each outcome. So, the advice to embrace uncertainty still applies, it just applies at the level of a properly thought out device that gets you an optimal decision.

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *