Why we don’t (usually) worry about multiple comparisons

Here’s the paper (with Jennifer and Masanao), and here’s the abstract:

The problem of multiple comparisons can disappear when viewed from a Bayesian perspective. We propose building multilevel models in the settings where multiple comparisons arise. These address the multiple comparisons problem and also yield more efficient estimates, especially in settings with low group-level variation, which is where multiple comparisons are a particular concern.

Multilevel models perform partial pooling (shifting estimates toward each other), whereas classical procedures typically keep the centers of intervals stationary, adjusting for multiple comparisons by making the intervals wider (or, equivalently, adjusting the p-values corresponding to intervals of fixed width). Multilevel estimates make comparisons more conservative, in the sense that intervals for comparisons are less likely to include zero; as a result, those comparisons that are made with confidence are more likely to be valid.

Check out Figures 4 and 6; it’s pretty cool stuff. You really see the efficiency gain. We had several more examples but no room in the paper for all of them. Also, here’s the presentation.

7 thoughts on “Why we don’t (usually) worry about multiple comparisons

  1. Very nice. When I try to convince my colleagues that Bayesian is the way to go, the interpretability of posterior distributions, and the ability to use them to calculate any posterior quantity or comparison you like, is one of my main arguments. This paper is a handy demonstration.

  2. Can't you ignore the multiple comparisons problem, even within a classical perspective? At least in some circumstances.

    Type 1, of course, is the probability of falsely rejecting the null. That is 0. Classical or Bayesian, the null is never exactly true, so we can't falsely reject it.

    So, why not get rid of the hypothesis testing paradigm, even if you don't want to shift fully to a Bayesian view. In the work I do, anyway, the key question is almost never "is the null true?" It's never exactly true, and we usually know it isn't even approximately true. We aren't randomly throwing garbage at the computer, after all.

    Usually, the key question is "How large is the effect?" The next question is "How good is our estimate of the effect size?" Neither of these depend on p, and neither are, as far as I can tell, affected by multiple comparisons

    Now, I know very little Bayesian statistics, so perhaps this is roughly equivalent to adapting an uninformative prior.

    Any thoughts?

  3. Peter,

    See our paper (and our talk) for some examples where multiple comparisons _are_ an issue. The short version is: if you're not doing partial pooling, then you do have to worry about these things.

  4. Any word on when this will be published?

    I continue to be intrigued by it. In an educational research setting, however, where there may already be three levels (e.g., school, person, person-occasion), adding a fourth level is a challenge. I think MPLUS may be only software that can handle this and then probably only for continuous outcomes. Not sure about that, but software has got be an issue when there are already several levels of clustering.

  5. Dave:

    The paper will appear in the Journal of Research on Educational Effectiveness, I hope not too far in the future. Regarding software, I think R and Stata can handle four-level models with no problem.

    Justin:

    I've never heard of that book.

Comments are closed.