## Performing design calculations (type M and type S errors) on a routine basis?

Somebody writes writes:

I am conducting a survival analysis (median follow up ~10 years) of subjects who enrolled on a prospective, non-randomized clinical trial for newly diagnosed multiple myeloma. The data were originally collected for research purposes and specifically to determine PFS and OS of the investigational regimen versus historic controls. The trial has been closed to new enrollment for many years; however, we are monitoring for disease progression and all cause mortality.

Here is the crux of the issue. Although data were prospectively collected for research purposes, my investigational variable was collected but not reported as a variable. The results of the prospective trial (PFS and OS) have been previously published in Blood. I am updating the original report with the long-term follow up, but am also exploring the potential impact of my new variable on PFS and OS. I have not yet analyzed the data and do not know the potential impact, or magnitude of impact, on PFS or OS. If I am interpreting your paper correctly, I believe that I should treat the power calculation on a post-hoc basis and utilize Type S and Type M analysis.

I know this is brief, if you would offer a comment or a direction I would be deeply grateful. I am sure it is obvious that I don’t study statistics, I focus on the biology of multiple myeloma.

Fair enough. I’m no expert on myeloma. As a matter of fact, I don’t even know what myeloma is! (Yes, I could google it, but that would be cheating.) Based on the above paragraphs, I assume it is a blood-related disease.

Anyway, my response is, yes, I think it would be a good idea to do some design analysis, using your best scientific understanding to hypothesize an effect size and then going from there, to see what “statistical significance” really implies in such a case, given your sample size and error variance. The key is to hypothesize a reasonable effect size—don’t just use the point estimate from a recent study, as this can be contaminated by the statistical significance filter.

1. Multiple Myeloma is a bone/blood cancer.

As for the power analysis, I’m not sure I’d spend that much time on it. Spend your time on coming up with a moderately informative prior on the effect size (for example, look at the range of reported effect sizes for similar variables in other cancers, or whatnot), and then run a Bayesian model using that prior. If your posterior probability mass is well outside your prior, you’ve discovered something, and if it’s within the core of the prior you’ve gotten a better sense of how big the effect is. If the posterior high probability interval includes zero, then it includes zero…

In the context of a bayesian analysis with the goal being to develop a posterior probability density, with data that has already been collected, I can’t see the advantage to power calculations.

Now, if you’re trying to propose that money be allocated to do a study, I’d see the point of trying to justify the money by showing that your data will be sufficient to distinguish an effect of a certain size, but that’s not where you are. And if you’re not going to do NHST then it doesn’t seem like “power” is the right thing to think about.

• Andrew says:

Dan:

As John and I discuss in the linked article, design calculations can be valuable even after a study has been performed. Sure, if you’re going to do a Bayesian analysis with informative prior, you don’t really need the design calculation. But if you want to interpret classical estimates (or noninformative Bayesian inferences), the design calculation is, I believe, a useful method for understanding the estimates in the context of subject-matter knowledge (that is, prior information).

As you can see, I didn’t say anything about “power”—I’m talking about Type S and Type M errors.

• Yes, if you want to understand classical estimates or hypothesis tests, a post-hoc design calculation can help you understand the risks of over-inflation (type M) and soforth, but if you have enough information to do the design calculation, and you have already performed the study, and you have the data (it’s not in someone else’s old tape archives in the basement etc) then it seems like you’re better off just going ahead with the Bayesian analysis using an informed prior.

It seems to me the value of post-hoc design type analysis is to focus on whether other people’s results are as convincing as those other people want to portray them, without having to actually have their data and re-do the analysis using an informed model.

• Put another way, the original question seems to come from someone who is aware of the danger of analyzing data that wasn’t designed to give high precision estimates but doesn’t know what to do about it. So, what should this person do: post-hoc decide how good the design was, or analyze the data in a way that extracts whatever information is available in it and includes a kind of automatic sensitivity analysis to tell how precisely the data inform the parameters?

The biggest risk in bad/low-power design is that you’ll do a classical analysis, get a point estimate, make a binary decision or 3 or 4 in series, and be way way off… This risk is mitigated in a Bayesian analysis because it gives you a picture of how much information you have. It tells you the range of things that are warranted by your prior knowledge and the data put together. If you had a poor design, the main risk is that you won’t be able to discern the degree of differences you’d like to, that the range of warranted conclusions includes too many possibilities compared to what you’d like.

pre-study, the design analysis can tell you that you’d probably do well to improve your design (bigger N, more covariates, etc) to get the level of precision you’d like before running the study… post-hoc though, you’re best off just extracting the info you have and letting the Bayesian machinery inform you if it’s a lot or a little or something in-between.

Of course, Andrew knows this, but the original questioner may not.

If, on the other hand, the original questioner can’t for some reason do the Bayesian analysis, then yes, it makes sense to find out going into some unregularized classical estimation exercise how informative the data are likely to be. It’s a way to sort of sneak some Bayes back in under the radar ;-)

2. Rahul says:

Actually, after you hypothesize an effect size it would be interesting to ask four or five other domain experts ( whose judgement you respect ) their personal estimates for the effect size & compare the degree of consensus.

In effect size estimates as well as priors, I think people are far too sanguine about the existence of anything close to a consensus value.

My suspicion is that what constitutes a “a reasonable effect size” is in practice a very wide & subjective range.

• That’s one way to come up with an informed prior. If lots of different experts give small ranges that nevertheless vary a lot, (ie. 0-2, 10-15, 100-200, -20 to -30, etc etc) then you know that there isn’t really that much prior information. A reasonable prior would be something like normal(0,1000) for the above results. I’m always amazed at how uptight people get about choosing the prior. Once you get used to the idea that the prior is just a blanket that needs to cover all the reasonable options, and if it covers a certain number of unreasonable options it’s not a problem, but it IS a problem if it doesn’t cover the “true” value (ie. the true value definitely needs to be in the support, and should hopefully be in a highish probability region), you can start to choose priors pretty easily.

3. Keith O'Rourke says:

> potential impact of my new variable on PFS and OS

This suggests the purpose is to get at causality in some sense and confounding and bias will be by far the greatest challenge.

I also don’t think errors Type S and M or Type 1 or 2 will be well defined or sensibly defined.

There are groups that work on this problem (Donald Berry and or colleagues) and they likely will be thinking about informative priors on the biases in using historical control data so that they can be dealt with.