Take that, Bruno Frey! Pharma company busts through Arrow’s theorem, sets new record!

I will tell a story and then ask a question.

The story: “Thousands of Americans are alive today because they were luckily selected to be in the placebo arm of the study”

Paul Alper writes:

As far as I can tell, you have never written about Tambocor (Flecainide) and the so-called CAST study. A locally prominent statistician loaned me the 1995 book by Thomas J. Moore, Deadly Medicine; Why tens of thousands of heart patients died in America’s worst drug disaster. Quite an eyeopener on many fronts but I found some tangential goodies. From page 61:

Scientific articles routinely list so many coauthors that an unwritten code usually determines the order in which the names appear. The doctor who did the most work and probably wrote the article appears as the first-named author.

The unwritten code also provides that the last-named author is the “senior author.”

From page 62:

The authors of the Tambocor study apparently evaded this problem [refusal of journals to accept duplication] by submitting their manuscript simultaneously to three different journals [JAMA, Circulation, and American Heart Journal ].

From page 63;

In all, 3M succeeded in publishing the same study six times. [Seems like a violation of Arrow’s theorem. — ed.]

As a medical doctor once pointed out, thousands of Americans are alive today because they were luckily selected to be in the placebo arm of the study.

I was curious so I looked up Flecainide on Wikipedia and found that it’s an antiarrhythmic agent. Hey, I’ve had arrhythmia! Also the drug remains in use. The Wikipedia entry didn’t mention any scandal; it just said that the results of the Cardiac Arrhythmia Suppression Trial (CAST) “were so significant that the trial was stopped early and preliminary results were published.” I followed the link which reports that “the study found that the tested drugs increased mortality instead of lowering it as was expected”:

Total mortality was significantly higher with both encainide and flecainide at a mean follow-up period of ten months. Within about two years after enrollment, encainide and flecainide were discontinued because of increased mortality and sudden cardiac death. CAST II compared moracizine to placebo but was also stopped because of early (within two weeks) cardiac death in the moracizine group, and long-term survival seemed highly unlikely. The excess mortality was attributed to proarrhythmic effects of the agents.

Alper adds more info from here:

From page 200, “Status on Sept. 1, 1998” with X and Y not being named (as it turned out, placebo and treatment, respectively)

Group
X Y
Sudden death 3 19
Total Patients 576 571

But the drug is still in use, I guess it’s believed to help for some people. An interesting example of a varying treatment effect, indicating problems with the traditional statistical paradigm of estimating a constant or average effect.

The question: How to think about this?

The above story looks pretty bad. On the other hand, thousands of new drugs get tried out, some of them help, it stands to reason that some of them will hurt and even kill people too. So maybe this sort of negative study is an inevitable consequence of a useful process?

If anyone tried to bury the negative data, sure, that’s just evilicious. But if they legitimately thought the drug might work, and then it turned to kill people, them’s the breaks, right? Nothing unethical at all, prospectively speaking.

And if you publish your negative results 6 times, that shows a real commitment to correcting the record!

19 thoughts on “Take that, Bruno Frey! Pharma company busts through Arrow’s theorem, sets new record!

  1. Just to fill in the story a bit… in the late 70s doctors noted that their patients with arrhythmia were also more likely to suffer sudden cardiac death. So they made the inferential leap “hey, if we stop these arrhythmias, we’ll prevent these sudden deaths.” These anti-arrhythmia drugs were developed and tested for reducing arrhythmias and they worked! But mortality wasn’t an outcome in these trials (it’s often easier to show a benefit for short-term surrogate outcome, like arrhythmia or, um, cholesterol, than it is to show a benefit for the long-term clinical outcome of interest). So the CAST trial was conducted later to assess the benefits (or not, as the case may be) for mortality. The results came as a bit of shock, and there doesn’t seem to be anything particularly evil. It’s unfortunate that patients died as a result, obviously, but to me this seems like science at work. The only real problem seems to have been making that initial inferential leap.

    • >”to me this seems like science at work”

      That depends. Sid they just observe a statistically significant p-value and then jump to the conclusion that people should be receiving this drug? Or did they first attempt to verify they were measuring the correct thing, rule out alternative explanations for the observations, and make some kind of precise prediction that was later consistent with new data? Just doing random things and then later admitting you interpreted the results incorrectly doesn’t make it science… there needs to be a real effort made towards understanding what is going on to generate the data.

      • “That depends. Did they just observe a statistically significant p-value and then jump to the conclusion that people should be receiving this drug?”

        No. Absolutely not. That is not how clinical trials and surrogates work.

        Surrogates are approved after a very lengthy discussion in which it is shown that X is a reasonable surrogate for Y *and* it is shown that collecting information about Y would not be feasible. There’s the ideal data set, and then there’s the real world consequences of not having the resources to collect the ideal data set (for example, it’s really hard to test medication for Ebola victims). The argument is that not being able to collect the perfect data set should not necessarily mean automatic rejection, as long as we can have high confidence in the results from a slightly less ideal but actually feasible data set. Again, approving use of a surrogate is not taken lightly.

        In this case, I (personally) believe that they had a very *reasonable* surrogate and made a very reasonable decision. But unfortunately, what happened to be very reasonable was not true (i.e. reducing arrhythmia != reducing deaths). To me, this is by no means a result of blindly following p-values, but rather a well considered risk analysis in which reducing expected loss turned out not to reduce observed loss. Tragic, but not a result of incompetence; it was the best decision given in available resources and information, but it was still wrong.

        • >”No. Absolutely not. That is not how clinical trials and surrogates work.”

          I don’t know either way. But I do know you didn’t link to any papers demonstrating the behavior you claim. This makes me feel suspicious. Can you link to the most convincing one on this topic, so that it can be discussed here?

        • I assume you mean linking to how the approval process of a surrogate end point? You can check the FDA website:

          http://www.fda.gov/ForPatients/Approvals/Fast/ucm405447.htm

          Most of my background on clinical trials is based on two internships at two different major pharmaceutical during grad school and observing the meetings and discussions for such topics. However, I have since left that field. And of course, this experience is not really citable, so you’re more than welcome to not readily accept it on my word alone. But the FDA page on surrogates is a good place to start if you are interested.

          In addition, your comment seemed to suggest that the analysis of data seemed very willy-nilly, with p-hacking and the usual set of mistakes. This is definitely something clinical trials actually do right. They really are the best example of preregistration: *every* protocol must be written out a head of time (at one of the companies I worked, there was a large team devoted *just to planning out how the randomization would be done*). And as such, there’s plenty of discussion of why they fail often!

          If you’re curious, I think this paper touches on some of the topics nicely:
          http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2864134/

        • Thank you for responding. I was looking for primary literature that you found convincing regarding Flecainide, so that we can get a better idea of what was actually done (rather than what people say they are doing, like in these review articles). It sounds like you are no more familiar with that research than I am though, so don’t worry about it.

          My skepticism is based on my own prior experience. I have read many, many medical papers. They nearly all contain the exact same “see significant p-value, wildly leap to favorite conclusion” error. I remember during a discussion on this blog awhile back I demonstrated this by clicking the first link to a paper shown at the NEMJ site. The error was found in the abstract, in the first sentence of the abstract if I remember correctly. That was no fluke, I was trained to make this error myself.

        • I certainly agree that medical doctors use statistics in the most brutal of manners. Two of my favorite anecdotes:

          1.) A friend of mine worked as an intern for a med student on a study he was required to run for his degree (she reported that he could not have cared less about this study). In the paper they wrote up, their finding was that how elderly patients perform on their vision test is correlated with how they report their vision to be, so a lot of time and money can be saved by merely asking elderly patients how their vision was rather than actually testing it.

          To reinstate some confidence in the peer review system, when I met this MD, literally the first thing he said to my friend was “I sent the paper in and it was almost immediately rejected, I don’t understand it”.

          2.) I was consulting for an MD once and they mentioned “I don’t trust Excel for any of my statistics” to which I thought “great!”. But then they followed up with “…so I do everything by hand”. Nope. Wrong direction on the mistrust of Excel there.

          With that said, getting a drug approved is a very different process from publishing in a medical journal.

        • @ Cliff AB: I’m not convinced that that clinical trials really do things right. The final report needs to be checked against preregistered plans. There seems to be evidence that “outcome switching” is still common — see http://compare-trials.org/ for more information.

        • @Martha:

          That’s very interesting, but also extremely confusing. A company must register their protocol in advance with the FDA (although perhaps those are out side of the USA?), including a detailed description of all statistical methods, handling of drop out, etc. for approval. On top of that, the data is typically analyzed at interim time points by an independent Data Monitoring Committee (the sponsor is not even allowed to see their own data) for harmful effects and futility. So it seems extremely odd that they write out a full protocol, have an independent group analyze their data according to this protocol, and then complete change everything at the last minute without the FDA even raising an eyebrow.

          Looking through at your link, I’m not sure that the decision of the clinical trial changed, but rather that the (or even just *a*) final published paper changed outcomes from the original stated response variables of the trial. So it would seem that if I ran a large clinical trial to see if X affected Y, it failed to pass (lots of phase III trials do, much to the sponsor’s dismay), but I did see that X appeared to affect Z, and I published this (which is not the same as being FDA approved, of course), then they would have counted this as a misreport (and if the authors presented this as a confirmatory study, rather than an exploratory study, they would be correct in calling this a misreport. Still deserves publication in my view, just should be properly labeled as very expensive exploratory findings). But in that case, the FDA has not accepted some drug based on a non-preregistered trial.

          But take that explanation with a grain of salt. I tried to follow up in their database to test my hypothesis of what they are saying and quickly realized I have a whole lot of other things I need to do than read a lot of medical papers that I could only barely comprehend.

        • @Cliff AB:

          I don’t know all the details (some other contributors to this blog might), but I have heard that the operations of the FDA (and European Medicines Agency) are not always as straightforward as one might think — e.g., that there are some rules which don’t have a good justification, that some practices prescribed or proscribed by legislation, and some things considered “trade secrets” that one might think interfere with important aspects of transparency. But I don’t have references at hand.

          However, someone interested in pursuing the matter further might be interested in looking at the articles by Doshi et al (2012), Doshi et al (2013), le Noury et al (2015) and Jeffereson et at (2014) listed in the references section of http://www.ma.utexas.edu/users/mks/CommonMistakes2016/AppendixDayThree2016.pdf.

          Also of possible interest: A Der Spiegel interview with whistleblower Peter Wilmshurst, at
          http://www.spiegel.de/international/zeitgeist/spiegelinterview-with-whistleblower-doctor-peter-wilmshursta-1052159.html

  2. On the more general problem of using surrogate endpoints in clinical trials of the effectiveness of drugs:
    Thomas R. Fleming and John H. Powers: Biomarkers and surrogate endpoints in clinical trials. Statistics in Medicine, 2012, 31, 2973-2984
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3551627/

    Therapeutics Initiative: The limitations and potential hazards of using surrogate markers. Feb 2015 (2 pages)
    http://www.ti.ubc.ca/2015/02/03/the-limitations-and-potential-hazards-of-using-surrogate-markers/

    Use of surrogate outcomes in US FDA drug approvals, 2003–2012: a survey. 2015;5:e007960.
    doi:10.1136/bmjopen-2015-007960

  3. Cripes — reading this nearly gave me a heart attack. Then I realised this was not new information especially but that the lessons nonetheless remain.

    Mark is correct in that there is a deeper story here at several levels. The first, which is a repeat of **Andrew’s mantra about grouping**, is that AFib presentations are by no means all equal. Around 10-13% of presentations are of endurance athletes (like myself) whose cardiac structure probably created the condition but for whom there are no other pathological findings, rather than other unknown mechanisms. Second, age impacts heavily on incidence — AFib becomes significantly more common as one ages, irrespective of other factors. Third, echo-c results of structural changes greatly change prognosis. So participant triage and grouping was inadequate in this research area. Current technology also is much more able to undertake the necessary differentiation.

    Eventual outcome, including death, also seems to be potentially confounded with both cause and unseen populations. There is a significant proportion of clinical presentations with stroke whose underlying pathology is undetected or untreated AFib, due to coagulation and later release from atrial pooling. So, statistically it might have been important to compare AFib-related strokes incidence/death that were not being treated for that particular condition to those who had AFib, but were being treated, as well as varied treatment regimens. And again, even in this comparison, the IKr and other meds are not the main treatment for stroke prevention; warfarin, dabigitran etc are. So again, the model of single medication and outcome comparisons seems singly inadequate.

    Reminding us all that while we may have what we think is great clinical knowledge, we should always ask a good statistician to review and cast the net wider than the particular things we view as salient.

  4. For more on “medical reversals” see

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3878120/

    “The Clinical Outcomes Utilizing Revascularization and Aggressive Drug Evaluation (COURAGE) trial and CAST study are notable examples of investigations that overturned current practice by demonstrating that these interventions offered no survival benefits.”

    ” In fact, cardiologists were so confident these agents improved outcomes that recruitment to CAST was slow, as many felt it was unethical to allow patients a chance of receiving placebo (10). CAST however reached the exact opposite conclusion, showing increased rates of death from the use of these drugs, contradicting nearly a decade of widespread practice, and showing that even the best mechanistic reasoning could be wrong. The results of CAST imply that premature ventricular contractions are either (I) not causally related to death or (II) the off target effects of treating PVCs with these drugs outweigh the benefits. Some estimate that 50,000 Americans [!] died because of this erroneous practice during the years it was in favor (11).”

    • >”even the best mechanistic reasoning could be wrong”

      As in my earlier post above, I claim no expertise on this topic, but I doubt the reasoning mentioned could stand up to much scrutiny. If I happen to be right, this should not be surprising. It is standard:

      “Biologists summarize their results with the help of alltoo- well recognizable diagrams, in which a favorite protein is placed in the middle and connected to everything else with twoway arrows. Even if a diagram makes overall sense (Figure 3A), it is usually useless for a quantitative analysis, which limits its predictive or investigative value to a very narrow range. The language used by biologists for verbal communications is not better and is not unlike that used by stock market analysts. Both are vague (e.g., “a balance between pro- and antiapoptotic Bcl-2 proteins appears to control the cell viability, and seems to correlate in the long term with the ability to form tumors”) and avoid clear predictions.”

      http://www.ncbi.nlm.nih.gov/pubmed/12242150

  5. No one seemed to mention the current practice with Fleucanide, it is not prescribed if you’ve ever had a heart attack. Ever…not recently, not in the last decade, but ever. Browsing the links, they all seem to point out the mortality result but not why they occurred. They looked at the subgroups and found that those with scarring from heart attacks had an increased mortality rate. For many antiarrhythmic drugs they require a hospital stay with continuous monitoring to ensure there are not adverse effects.

    That explains why it is still in use, one of Andrew’s comments towards the end of the post.

Leave a Reply to Anoneuoid Cancel reply

Your email address will not be published. Required fields are marked *