Dan Chamberlain writes:

I am working on a Bayesian analysis of some data from a randomized controlled trial comparing two different drugs for treating seizures in children. I have been using your book as a resource and I have a question about hierarchical modeling. If you have the time, I would greatly appreciate any advice you could provide.

There has been only one past study on the different effectiveness of these two drugs in children. As you say in section 5.7, it is difficult to determine how much pooling should occur with only two groups. This makes intuitive sense to me, because with only two groups, you are unable to estimate the spread of the underlying distribution. Would you still use a hierarchical model in this case? If so, what benefit does a hierarchical model provide? What I’m doing right now is using the earlier study to inform the priors and then using only a single-level model.

My reply:

My quick answer is that I do think a hierarchical model is appropriate here, but, with data on only 2 groups, it will be necessary to put strong prior information on the between-groups sd parameter tau. If you simply put the default uniform prior on that group-level scale parameter, all the posterior mass will be at tau = infinity, and you’ll do no pooling at all. Which in some cases might make sense but will discard information if the groups are known or strongly believed to be similar. I can see how this might not be so clear in our book because we have very little there on informative priors. Anyway, the benefit of the hierarchical model is that it gives you partial pooling and thus more efficient estimates of the parameters in each group. You just need to supply some prior information on how different the groups might be.

Now, at this point, you (or maybe not you, but someone else) might want to be a “tough guy” and say you want to be entirely data-based, you don’t want to do any partial pooling here, maybe you’re ok with hierarchical models with hyperparameters estimated from data but to put in strong prior information to enforce partial pooling, that’s a step too far. And you (or your hypothetical friend) can feel free to take this position, to decide that with J=2 you’re not gonna bother to do any pooling. But . . . you gotta look at the implications of such a policy. In practice, if you’re not allowing yourself to do hierarchical modeling, you’ll end up being wary of small groups because your simple completely unpooled estimates will be noisy. Hence you’ll end up aggregating your data in some way—across drugs, or across disparate groups of patients, or over time—in order to have enough of a sample size for the no-pooling approach to work. And then you’ll automatically be doing complete pooling within each of the groups that you’ve formed.

As always, your endgame resources dictates your middlegame strategy. If you don’t have partial pooling in your toolkit, this restricts how you can analyze the data that do come in, and it will lead you to combine groups—and thus, to commit to complete pooling—ahead of time. Thus, a seemingly principled stand against any combination of information, can lead in practice to complete pooling and no modeling of variation at all.

Also we should be using informative priors for regression coefficients. But that’s another (related) story.

Chamberlain followed up:

This approach makes much more sense and I will switch to using a hierarchical model. Before, I was essentially using an informative prior to determine pooling by using a weighted version of the earlier study to define the prior for my single-level model. The approach you suggest makes the process more transparent and is more versatile. This leads to a couple of follow-up questions:

1. Now that I have a hierarchical model, should I focus on reporting the hyperparameter that describes the average success of a drug across different trials or the result for my specific trial adjusted by pooling?

2. In “Berry DA. Bayesian clinical trials. Nat Rev Drug Discov. 2006 Jan;5(1):27–36.”, Berry recommends doing sensitivity analysis on the priors. Since my informative prior is going to dictate the amount of pooling, would you find it useful to report the results with different priors?

My reply:

1. If you only have J=2, I don’t think you’ll get that much out of reporting inference for the hyper-variance. Probably best to report inference for the two conditions of interest.

2. Sure, that’s fine, especially given that you can consider “no pooling” (tau=infinity) and “complete pooling” (tau=0) as special-case extremes.

“Hence you’ll end up aggregating your data in some way—across drugs, or across disparate groups of patients, or over time—in order to have enough of a sample size for the no-pooling approach to work.”

To say that the no-pooling approach doesn’t work often amounts to the reviewers or investigators not believing the results because there is “not enough data” in some qualitative sense. I understand that aggregation, and thus complete pooling, is often done in response, but I’d appreciate some good examples to share with my colleagues.

What do you think about fixing the group-level variance, rather than estimating it? Even something like a very informative inverse-gamma prior isn’t going to let you wring a reliable estimate out of two groups, so why not pick a a plausible value and fix the variance for “regularization” purposes?

Corson:

With only 2 groups, you need strong prior information on the between-group variance. Setting the between-group variance to a fixed variance is an example of a very strong prior, which can indeed make sense in many examples.