“This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything”

[cat picture]

In the context of a statistical application, someone wrote:

Since data is retrospective I had to use informative prior. The fit of urine improved significantly (very good) without really affecting concentration. This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything. Hopefully in this case I can justify the prior that 5% error in urine measurements is reasonable.

I responded to “This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything” as follows:

That’s ok for me. The point is that FDA should require that the prior be science-based. For example, consider the normal(0, infinity) priors that are implicitly used by the fat-arms-and-political-attitudes crowd. Those are really strong priors, not at all supported by science!

My correspondent replied:

I think if you read clinical studies for medical devices carefully enough the allowance for informative priors as long as they are scientifically based is there (we need to doublecheck). But prevailing recommendation is using non informative priors. Therefore my guess the conception is that FDA is not supporting strong priors. We can’t deny that FDA has prominent Bayesists. The recent comment about using informative prior by FDA statistician was “how we defend the approach with industry” so there are some other aspects that are not so visible. In my case the strong prior is justified since 5% error in urine measurements is tight but still in the realm of reasonable. Since we had only few data points it was the only way to reconstruct the parameters and show that PBPK models fit clinical data well.

Just to clarify: I’m not knocking the FDA in this post. This should be clear from the text but I could see how the title (which is a quote from my correspondent) could give the wrong impression.

32 thoughts on ““This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything”

  1. > my guess the conception is that FDA is not supporting strong priors
    Mentioning that its a guess does keep open the possibility you are wrong but also the FDA is a collection of many folks including many statisticians who vary in their opinions/abilities.

    Industry and industry hired/funded consultants have understandable conflicts of interest that should be fully considered in all arguments including the prior and how informative it should be (e.g. informative for treatment effects, non-informative for side effects).

    My advice, pretend you are wrong about their motivations (no mater how hard) but not that wrong about the prior and make better, more complete arguments in support of it.

    In the blogasphere there does seem to be the assumption that when statisticians are recruited into a regulatory agency they become docile and brain dead. This is dangerous http://www.reuters.com/article/us-science-europe-glyphosate-exclusive-idUSKBN17N1X3

    • My understanding on Glyphosate is that the pure chemical itself has been shown to have very low risk of cancer, they feed it to cancer prone mice and it produces basically no change in cancer rates…

      On the other hand, the *formulation* of the glyphosate based weed-killers include many chemicals other than glyphosate, and these could well be cancer causing, but are not well studied. It’s the surfactants and things that break up the cell walls to let the glyphosate get better absorbed that are the focus of the people who actually know something about this stuff. (I needed to kill out a lot of grass at my house and I did some literature research on this before applying glyphosate, this was the consensus that I came to)

      So, in the sense that this issue is overlooked in regulatory agencies, it is a kind of soft-headedness, don’t rock the boat on the multi-billion dollar company sort of thing. Studies using randomly selected products from agriculture or hardware stores should be done, with the full in-the-bottle formulation, not just the pure glyphosate powder. But, who is taking the initiative to do that? Regulatory capture is a thing.

      • > issue is overlooked in regulatory agencies
        Overlooked is your guess to explain your assessment of not adequately addressed as well as “don’t rock the boat” being a guess at the motivation involved.

        Legal frameworks are a thing too.

        • Yes, you’re right, I’m imposing a regulatory capture prior, in large part because the issue of glyphosate vs glyphosate formulations has been out there for decades and no studies have been done addressing this question by regulatory agencies even though the budget to do a preliminary study is less than the cost of a single bureaucrat’s annual salary.

        • Also note that I think it’s a very different thing to say that the organization acts as if it were captured, and to say that individual people within the organization are captured. It’s entirely possible for rules and regulations of behavior to evolve to allow for capture of the agency while individual people in the organization would really like to do their job but are limited by those regulations/funding/etc.

        • EPA spent $8 Billion last year with about 15k personnel. (https://www.epa.gov/planandbudget/budget) that’s about $520k per person employed by the agency. Running a mouse study using cancer-prone mice to look at carcinogenicity of 10 store-bought glyphosate formulations would certainly have a marginal cost less than $100k (a cage costs something like $2/day, and can house something like 6 to 12 mice https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2786928/) the goal is preliminary data, so it’d be sufficient to have say 50 cages for a year which is around $40k and let’s call it 10% time for one employee, or another $10k, and could be done in-house or by a call for proposals and outsourced to a university lab. The glyphosate formulations issue has been around at least 20 years, and glyphosate is used on just about every food crop there is.

          I don’t know why it hasn’t been done, but it certainly *isn’t* because someone did a cost benefit analysis and found out that it was not cost effective (let’s call it $100k to do a study that affects our knowledge of the carcinogenicity of the food 320 million americans eat daily?). It seems likely that doing this study would get you in trouble at a very very political organization, and so anyone who thought to do it got a lot of push-back, and/or lost their job, but I admit this is prior-informed speculation.

          Still, I think if you want to do a good job at a regulatory organization with the areas that you ARE allowed to research, you wind up “getting along” with respect to the captured parts, because the organization *is* captured and survivorship bias says the people in the organization won’t be rocking the captured parts of the boat.

        • And, to be fair, this may be getting better… see section starting page 141 here:

          https://www.epa.gov/sites/production/files/2016-09/documents/glyphosate_issue_paper_evaluation_of_carcincogenic_potential.pdf

          where they point out that “formulations are not equally toxic and that the toxicity is not being driven by the amount of glyphosate in the formulations” and they also say “however, there are relatively few research projects that have attempted to directly compare glyphosate and the formulations in the same experimental design. Furthermore, there are even less instances of studies comparing toxicity across formulations.”

          and also there’s this:

          https://echa.europa.eu/documents/10162/22863068/glyphosate_iarc_en.pdf/74a6810b-1704-1cfd-49a7-e3f2103d1180

          “Sufficient evidence of cancer in animals” and “high statistical significance, tumours in the absence of toxicity, causal relationship established”

          So, if the EU environmental agency is coming out with a “glyphosate is not carcinogenic” and their individual scientists are publishing reports saying “glyphosate is a proven carcinogen in mice and has strong evidence in humans” then this supports my concept of “regulatory capture” as the mechanism.

    • Regarding blaggosphere and assumptions about FDA statisticians – dangerous and stupid.

      I’ve found that FDA statisticians are generally a smart and dedicated bunch that care very deeply about statistics and their role at FDA.

      That’s not to say I haven’t had my differences with them or felt that they could be more correct.

    • “In the blogasphere there does seem to be the assumption that when statisticians are recruited into a regulatory agency they become docile and brain dead.”

      I can’t comment on the FDA, but I used to work in bank stress testing (which originated from the Dodd-Frank Act in response to the Financial Crisis) and can confirm that our regulators (who came from the Federal Reserve once a year) were not statisticians, nor had they received adequate instruction from any qualified or competent statistician. The Fed examiners literally are unhappy with banks that DO NOT p-hack (you read that correctly). As in, they won’t accept any models with variables with p > 0.05. This is for a predictive exercise, mind you.

      This is from a paper published by the Fed that’s supposed to serve as an example for how our largest banks (JP Morgan, Wells Fargo, etc.) are supposed to demonstrate they’ll be financially secure in a 2008-like economic scenario:

      1. Details of methodology for specification of CLASS regression models
      This section describes in more detail our approach for choosing the specifications of each income and
      loss equation used in the CLASS model. We conducted a search across at least six different
      specifications for each equation. Decisions about which specifications to consider and finally select
      were guided by the following four principles:

      (i) Statistical and economic significance. We chose macroeconomic variables and banklevel
      controls that were statistically and economically significantly related to the income
      component ratio historically.


      https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr663.pdf

      The even more amazing part is that the stress testing industry as a whole grown to support and institutionalize this mentality. My company paid one the Big 4 professional services firms hundreds of thousands USD to tell us that our models weren’t statistically sound because they included insignificant variables (nevermind that the “insignificant” variables were GDP and unemployment growth in a model that estimated macro-level credit losses…). They recommended we use stepwise selection on p < 0.05 in the future. These were the "experts" in building models for stress testing. There are entire model validation teams at large banks checking to make sure the model developers only include "significant" variables in the models.

      Even scarier, the entire industry agrees that pretty much all banks are data-poor. The point of the exercise is to simulate a scenario similar to the Financial Crisis, but most banks don't have data systems that keep data earlier 2000 or so. In other words, if they're attempting predicting a 2008-like scenario, they have a sample period that includes only one severe stress period (the 2003 recession was much smaller in magnitude). Yet, nobody sees a problem with applying frequentist methods and significance filters to build models for this problem… Bayesian methods are virtually nonexistent in this space by the way, as far as I could tell.

      The way I make sense of how this happens is that banks don't use these models to make any decisions anyway, and the consultants take home their share whether or not the models predict well. All anyone cares about is satisfying the regulators, who are pushing the current system.

      So getting back to my original point, my experience leads me to concur with the blogosphere here. That said, it's perfectly possible that the FDA (or other regulators in general) are much more competent than the Fed's bank stress testing examiners. And from what I've read, that appears to be the case. But I now come in with at least a cautious prior when I hear about the conclusions of regulatory statisticians following that experience.

      • “They recommended we use stepwise selection on p < 0.05 in the future."

        Aargh!

        More generally: What you (Steven) write is depressing (but I guess, sadly, not surprising).

      • Yikes – almost a bad as in social psychology research or many other academic fields ;-)

        More importantly you have some experience which you can share with us and that has informed your view point.

        Carlos has actually read some of the FDA online guidance to become informed.

        Fortunately (unfortunately) guidances are not rules but more defaults or things that need to be considered so they don’t really provide a full sense of what happens.

        Also, there can be large variances in the abilities of staff and managers in any large organisation.

        • As I say above, there is a big difference between an organization acting as though it were full of incompetent people, and actually being full of incompetent people. I don’t think we should generalize from the average effect (organization acts as if it were incompetent) to individual causes (each individual must also be incompetent).

          When it comes to the finance industry, it’s pretty clear to me (i’ve worked for financial data analysis companies) that this is the industry of “gaming the system”. That’s what the profession does, they find opportunities for arbitrage, where doing X and then doing Y causes money to flow into their coffers. And taken as a whole, for ANY actions X and Y there is some corner of the world where someone is willing to consider the actions, whether that’s murdering JFK or offering “e-waste recycling” that involves dumping toxic heavy metals into a river in china, or convincing unemployed people to get million dollar mortgages and then selling the resulting mortgage backed security to a pension fund, or whatever… there is someone somewhere who will fund pretty much any scheme. Does this mean everyone who works for finance is evil? No. Not even close. But, willingness to do evil WILL increase your opportunities to make money. Look at Martin Shkreli (Pharma Bro). It seems very likely that he will do jail time, he embodies the perfect example of how sociopathy can be rewarded in the finance industry.

          On the other hand, when it comes to lobbying that turns potentially meaningful protections against bad practices into rubber-stamp procedures that are easily gamed, this is a very simple extension of gaming the stock market or gaming the insurance market or gaming everything else. That’s what they do, and it isn’t the kind of obvious evil that might make people with a conscience pause. The logic goes something like “who actually is harmed by us making this regulatory requirement easier to meet?” or even “If the government doesn’t require it, I’m not going to bother checking it”. The next step along this line is “get a law passed saying that the government can’t require it”

          So, my opinion on regulatory agencies is that they very often become a license to do marginal things while being able to have a bright line that keeps you out of jail. This is the essence of the concept of “regulatory capture” and the fault is much more often Congress (who provides the laws that the agency employees have to follow) than the individual employees at the organizations (statisticians at the FDA for example).

  2. I don’t know what’s the situation for drugs, but for medical devices the FDA is not opposed in principle to informative priors according to the published guidelines. See for example sections 4.5 “Initial information about the endpoints: prior distributions” and 4.6 “Borrowing strength from other studies: hierarchical models” at https://www.fda.gov/MedicalDevices/ucm071072.htm

  3. Im sorry, but this analysis sounds very biased from the description shared here. If its as bad as it looks (choose a prior because it proves your model fits the data well), whatever results should be considered extremely preliminary, if not worthless.

  4. I haven’t dealt with the FDA much, but in situations like these, I wish the regulatory agency would just assert what its prior is. In a preliminary study, it would be fine for the company to use whatever prior it wants and justify its choices scientifically. But surely as it gets closer to the time when an approval decision has to be made, the FDA could say “run it with a normal(0,5)”. It is weird for the agent making the decision to not be able to specify the rest of the model.

    • Ben:

      It’s fine for the FDA to specify a prior distribution and for that matter a data distribution, for each study. These models would depend on the problem being studied. Normal(0,5) could make sense as a default on some scale but I think that in particular cases it would be too weak and would lead to overestimation of effect sizes.

      • Wouldn’t it be better for the data distribution to be establshed experimentally rather than specified by the FDA?

        From the link I sent before: “Different choices of prior information or different choices of model can produce different decisions. As a result, in the regulatory setting, the design of a Bayesian clinical trial involves pre-specification of and agreement on both the prior information and the model. Since reaching this agreement is often an iterative process, we recommend you meet with FDA early to obtain agreement upon the basic aspects of the Bayesian trial design.
        A change in the prior information or the model at a later stage of the trial may imperil the scientific validity of the trial results. For this reason, formal agreement meetings may be appropriate when using a Bayesian approach. Specifically, the identification of the prior information may be an appropriate topic of an agreement meeting.”

        • Carlos:

          Yes, I’d like the model to be specified based on prior information for each problem. But I agree with Ben that it’s a good idea to have defaults in any case. Regarding your quote: if formal agreement meetings may be appropriate when using a Bayesian approach, I think they’d be just as appropriate when using a non-Bayesian approach, as there will still be many user-supplied choices. I object to the notion that non-Bayesian methods are somehow automatic or safe.

        • Andrew: my first remark was a joke about the FDA specifying a “data distribution”. But I understand you didn’t mean the “actual” data distribution, but the “theoretical” data distribution in the model. Anyway, the FDA is not there to do the researcher’s job but to be convinced.

          My understanding is that companies very often meet with the FDA to discuss the design of their phase 3 trials. If the FDA doesn’t think your study is acceptable you want to know that as early as possible! Of course the need to check that the FDA is happy with what you are doing is higher when you are doing “non-standard” things (like adaptive designs). And in some cases (Special Protocol Agreement) you can get the FDA to formally acknowledge that they will not raise issues with the design of the trial later.

        • Anon:

          Uniform relative to the likelihood, assuming that the posterior has a finite integral in the limit. And if it doesn’t have that limiting finite integral, neither prior will work. And indeed in rstanarm we use proper priors by default.

        • clearly 10^308 isn’t extremely large ;-)

          Using internal set theory let our function be the curve given by the pdf of normal(0,N), let N be a nonstandard number. This function has infinitesimal value for all standard x. Then the standardization of this function is the function f(x) = 0

          of course f(x) = 0 isn’t a normalizable probability distribution… which is to say there is no standard probability distribution that is uniform on the reals.

          On the other hand, if you and I agreed that the range representable by single-precision floats is sufficient to cover the parameter space, and you generated double floats and throw them out if they are outside the single-float range… you’ll find that the resulting distribution is flat within the restricted range.

          If we can both agree that for every real-world analysis there exists *some* floating point representation with some finite number of bits that is sufficient for this analysis, then there is always a normal(0,N) with N vastly larger than the real-world range which results in a flat distribution over the real-world range.

  5. Is there any reason why recently I don’t see cat pictures here anymore but rather only “[cat picture]” with a link? Of is something wrong on my side?

Leave a Reply to Martha (Smith) Cancel reply

Your email address will not be published. Required fields are marked *