Yi-Chun Ou writes:
I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question.
The data structure is like this:
Level one: customer (8444 customers)
Level two: companys (90 companies)
Level three: industry (17 industries)
I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models?
My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots of work on what to do with hundreds of data points and thousands of predictors, but not so much with 17 data and 6 predictors.