Problems with surrogate markers

Paul Alper points us to this article in Health News Review—I can’t figure out who wrote it—warning of problems with the use of surrogate outcomes for policy evaluation:

“New drug improves bone density by 40%.”

At first glance, this sounds like great news. But there’s a problem: We have no idea if this means the drug also cuts the risk of bone fractures, which is the outcome that we really care about.

So why do researchers measure bone density instead of fractures? For several reasons, it can be difficult to determine whether a treatment will result in a clear benefit for patients, such as preventing death or improving quality of life. It may take decades, for example, to see if a new osteoporosis drug ultimately reduces fractures, so researchers look for what are hopefully reliable indirect markers to measure, such as bone density.

These substitutes, which go by several names–surrogate measures, markers or endpoints–ideally can be assessed quickly and easily and are expected to correlate with a meaningful outcome.

OK, so what’s the problem? The article explains:

Not all surrogate measures have turned out to be good ones. Often a drug that influences a surrogate measure turns out to produce no meaningful result for patients, referred to as a clinical outcome.

In some cases, there is even harm. In the landmark Cardiac Arrhythmia Suppression Trial (CAST), drugs approved for their ability to suppress a surrogate marker — abnormal heartbeats — were found to actually increase rather than decrease the risk of death. . . .

But surrogate markers are common: Between 2010 and 2012, the FDA says it approved 45 percent of new drugs on the basis of a surrogate endpoint. . . . One analysis showed that 67 percent of cancer drug approvals over a five-year period were based on surrogates. From 2003 to 2012 the FDA used surrogates for seven out of nine drugs approved for chronic obstructive pulmonary disease, all 26 approved drugs for diabetes, and all nine drugs approved for glaucoma . . .

The use of weak surrogate-based evidence has flooded the market with expensive duds, many argue. . . .

Here are some examples:

Drugs are frequently approved on the basis of uncertain markers such as “progression free survival,” which is the amount of time between treatment and worsening of symptoms. The drug Avastin won accelerated FDA approval to treat metastatic breast cancer based on its ability to delay tumor growth, but that approval was revoked when multiple randomized trials showed the drug didn’t improve survival and had significant side effects. . . .

Several stories reporting on a drug called evolucumab, known as a PCSK9 inhibitor, said it dramatically lowered LDL cholesterol in a 24-week trial, but didn’t note that LDL is a surrogate marker for heart disease. . . .

A news release claiming that blueberry concentrate improves brain function in older people failed to point out that brain blood flow and other biomarkers were “not a measurable clinical benefit,” according to our review.

This is not to say that it’s a bad idea to measure surrogate outcomes, just that we should keep our eye on the ball and we should report things accurately. From a statistical perspective, the challenge is to build and estimate models connecting background variables, treatments, intermediate, and ultimate outcomes.

14 thoughts on “Problems with surrogate markers

  1. From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2733256/ Gerd Gigerenza writes about surrogate criteria, in particular 5-year survival rate, a surrogate, vs mortality rate, the true figure of merit:

    “5-year survival rates in screening versus mortality rates. When running for president of the United States of America in 2007, the former New York City mayor, Rudi Giuliani, said in a campaign advertisement: ‘I had prostate cancer, 5, 6 years ago. My chance of surviving prostate cancer – and thank God I was cured of it – in the United States? 82%. My chance of surviving prostate cancer in England? Only 44% under socialized medicine.’ For Giuliani this meant that he was lucky to be living in New York rather than York, since his chances of surviving appeared to be twice as high. This was big news, but also a big mistake. The prostate cancer mortality rate is basically the same in the USA and the United Kingdom. Most importantly, 5-year survival rates and mortality rates are uncorrelated (r = 0.0) across the 20 most common solid tumours. In the context of screening, survival rates are misleading statistics.”

  2. The biggest problem I see is that “FDA approval” is a surrogate for “a drug works as intended”. The entire industry is based on a surrogate incentive. It all depends on how well FDA approval maps to good science.

    This is much different than “this computer/software is better than that computer/software for the same price, so I’ll buy that product” or “invest in that company”.

  3. There are, if memory serves, plenty of examples of drugs that lower cholesterol levels without causing fewer heart attacks. But that was also a critique, apparently unjustified, of statins until further restults came in. So the problem with surrogate markers is that both sides can seize on them to both prove and question efficacy — which then raises the question what is the point?

    • Enjoyed your post. Perhaps the word “surrogate” is dangerously imprecise. Is it meant to refer to (a) a cause, i.e. S (surrogate) causes B (bad), the interruption of S preventing the outcome sought to be avoided, (b) something correlated with the real cause, i.e. C (confounder) causes S and B, with the idea being that there’s probably only one mechanism whereby C causes S and B such that not-S implies not-B, or (c) where S is outside the causal chain that leads to B and is instead a mitigating/defensive/adaptive mechanism (the “let’s treat Alzheimer’s by treating beta-amyloid” attempt being a notorious example)?

      Or maybe “surrogate” is just a weasel word used to avoid causal language – language which could not be uttered without blushing when so little is known about causation.

      • My impression is that in the context of statistics, “surrogate” usually means “something that we measure when we can’t really measure (or don’t have the resources, time, or whatever to measure) what we want to measure”

        The big problem with them is that all too often people then forget about the real thing that needs to be measured, and slip into the fairy tale world of just considering the surrogate.

        • Of course, else the game would immediately be up. If researchers said they planned to study the marital status of widowers in order to discover why their wives had had heart attacks everyone would laugh. We want causation. We get its surrogate doppelganger instead.

  4. Type 2 diabetes is a fabulously successful brand built purely on surrogate markers, especially hemoglobin A1C.

    People with T2D have a far higher mortality rate than similar people without T2D. But in randomized trials targeting improved glycemic control, more intensive control does not lead to reduced mortality.

    Similarly, cardiovascular disease is far more common in patients with T2D, and drug treatment to improve glycemic control is advocated in order to reduce CV risk. But in 2008 the evidence of CV harm was so strong that the FDA ‘suggested’ Cardiovascular Outcome Trials to show that new drugs to improve glycemic control were safe and did not increase CV risk.

    Ask your favorite endocrinologist for the RCT evidence that glycemic control improves outcomes. You’ll be surprised.
    Then if you want to aggravate him or her, ask about the RCT evidence for the canonical A1C target of 7%.

    • “ask about the RCT evidence for the canonical A1C target of 7%.”

      Especially since A1C over 6.5 warrants diagnosis of Type 2 diabetes.

  5. Public Transportation lowered my gas mileage.

    This is a real example that I have used to make people think about outcomes to measure.

    About 7 years ago I bought a new car, this one had a feature to tell me my average gas mileage (miles per gallon). About a year later a commuter rail project in my area was finished and I started taking the train to work (still driving to the train station). I noticed that the reported gas mileage of my car was lower after I started taking the train (by 2 to 4 miles per gallon). I also noticed that instead of filling the tank 1 or 2 times a week, I was now filling up once every 2 or 2.5 weeks. Why did my gas mileage go down? Driving to the train station is pure city driving (stop signs, stop lights, waiting to turn left, etc.) and that is mostly what I use that car for. Before taking the train I combined the city driving with freeway driving (which generally results in better gas mileage that would average in with the city driving).

    So taking the train improved an important outcome (how much I spend on fuel, pollute the air, etc. per time period), but appears to hurt a surrogate measure (miles per gallon).

Leave a Reply to Jonathan (another one) Cancel reply

Your email address will not be published. Required fields are marked *