Multilevel modeling as nonparametric statistics

Tian asks a question about multilevel modeling:

Suppose you have 50 state-level effects parameters. If you treat them as fixed effects and assume non-informative priors, this should just be equivalent to compute the regular likelihood function, right?

If these 50 parameters are regarded as random effects and there is a hyper-distribution for them, say a normal, then the bell-shape of the normal distribution will lead to milder differences between these parameters. Would this fall under the argument of having parsimonious models?

This reminds me of a few things. First, I remember when I was in an oral exam at Berkeley. The student, not of one of my own advisees, was fitting a multilevel regression with varying intercepts for the 50 states, and one of the examiners said that he wasn’t sure he believed the exchangeability assumption. I pointed out that “exchangeability” refers to invariance to permutations of indexes, and thus alternative classical analyses (no pooling, complete pooling) also are exchangeable–they are just special cases of the multilevel model where the group-level variance is infinity or zero. (Yes, I know that by giving the story from my perspective, I’m being self-serving, but what choice do I have here?)

Nonparametric?

Getting back to Tian’s question, this is something I’ve thought about for awhile, that hierarchical models are, in fact, nonparametric. I don’t actually think the term “nonparametric” is clearly defined. Sometimes it refers to statistical procedures in which no parameters are estimated, other times it refers to settings where the number of parameters is infinite, or potentially infinite. One way to characterize nonparametric models is that the resulting inferences are not limited to any parametric form. In that sense, hierarchical (multilevel) Bayesian estimates are indeed nonparametric. The model that they are pooling toward is parametric, but the actual estimates are nonparametric in the sense that all things are possible, depending on how much pooling is done.

This was made clear to me in the research that led to my 1990 Jasa paper (with Gary King): By setting up a hierarchical model, we were not limiting the seats-votes curve to any particular parametric form. That made our model more appealing (at least from my perspective) than its predecessors in the seats-votes literature, where various parametric forms were assumed. I think it’s cool that parametric modeling can be used in the service of nonparametric inferences.