OK, OK, I know this is the stuff you come to the blog for. Not the pretty maps, not the comments about academia. And certainly not my thoughts on John P. Marquand, Weekend at Bernies, and other cultural treasures. No, it’s the statistics. So here goes…

A political scientist writes:

Here’s a question that occurred to me that others may also have. I imagine “Mister P” will become a popular technique to circumvent sample size limitations and create state-level data for various public opinion variables. Just wondering: are there any reasons why one wouldn’t want to use such estimates as a state-level outcome variable? In particular, does the dependence between observations caused by borrowing strength in the multilevel model violate the independence assumptions of standard statistical models? Lax and Phillips use “Mister P” state-level estimates as a predictor, but I’m not sure if someone has used them as an outcome or whether it would be appropriate to do so

.

First off, I love that the email to me was headed, “mister p question.” And I know Jeff will appreciate that too. We had many discussions about what to call the method.

To get back to the question at hand: yes, I think it should be ok to use estimates from Mister P as predictor or outcome variables in a subsequent analysis. In either case, it could be viewed as an approximation to a full model that incorporates your regression of interest, along with the Mr. P adjustments.

I imagine, though, that there are settings where you could get the wrong answer by using the Mr. P estimates as predictors or as outcomes. One way I could imagine things going wrong is through varying sample sizes. Estimates will get pooled more in the states with fewer respondents, and I could see this causing a problem. For a simple example, imagine a setting with a weak signal, lots of noise, and no state-level predictors. Then you’d “discover” that small states are all near the average, and large states are more variable.

Another way a problem could arise, perhaps, is if you have a state-level predictor that is not statistically significant but still induces a correlation. With the partial pooling, you’ll see a stronger relation with the predictor in the Mr. P estimates than in the raw data, and if you pipe this through to a regression analysis, I could imagine you could see statistical significance when it’s not really there.

I think there’s an article to be written on this.

2 thoughts on “OK, OK, I know this is the stuff you come to the blog for. Not the pretty maps, not the comments about academia. And certainly not my thoughts on John P. Marquand, Weekend at Bernies, and other cultural treasures. No, it’s the statistics. So here goes…

  1. I think the situation is analogous to the critique of Gary King's work on ecological inference by Herron and Shotts. The model assumed for creating the dependent variable is (logically) inconsistent with the regression model, given the random effects assumptions.

    The solution, of course, is to estimate the full model.

  2. Can someone point me to some good material on multilevel models and how they can be used to handle both individual-level and group-level predictors?

    This sounds like a topic which I ought to be interested in, as an analyst building individual level models based on both individual and group level variables.

Comments are closed.