A couple months ago we discussed this question from Sean de Hoon:

In many cross-national comparative studies, mixed effects models are being used in which a number of slopes are fixed and the slopes of one or two variables of interested are allowed to vary across countries. The aim is often then to explain the varying slopes by referring to some country-level characteristic.

My question is whether it is possible that the estimation of these random slopes (the interesting ones) is affected by the fact that the slopes of other (uninteresting) variables are fixed (even though they may actually vary over countries) and if so, how it may be affected? This question is inspired by many studies examining men’s wages which, for example, include a control for level of education that does not have a random slope, while I doubt whether education will have the same effect across countries.

Do you think the decision not to include many random slopes is predominantly methodologically informed? And do you think Bayesian analyses can provide a better solution for the kind of situations where many slopes should be allowed to vary?

My response is at the link. But here let me use the above email to point out the difficulty of referring to slopes or “effects” as “random” or “fixed.”

What is meant by “random”? That’s easy: a slope or effect is called “random” if (a) it varies by group, and (b) this variation is estimated using a probability model. Thus, “randomness” is a property both of the data model and of how this variation is estimated.

What is meant by “fixed”? This is not so clear: in the context above, a slope or effect is called “fixed” if it does not vary by group. But, in other contexts (notably in econometrics), “fixed effects” refers to a model where coefficients vary by group but where this variation is not estimated using a probability model.

In multilevel modeling terms, in de Hoon’s email, “fixed effects” are equal to each other and estimated using complete pooling, whereas in econometric terminology, “fixed effects” vary by group and are estimated using no pooling. Completely different models.

Which is one reason (but not the only reason) I prefer to avoid the terms “random” and “fixed” in describing models. Instead I like to separate the modeling decision (modeling a coefficient as varying or non-varying) and the inference decision (in Bayesian terms, what prior are we using; in multilevel modeling terms, how much pooling are we doing).

Agreed and well said again. I’m not a grandmaster, but if folks spent more time defining models from scratch the meaningless of these words would be clear. Just like bodies don’t think in words.

I agree that using “varying and non-varying” is best if that’s what you mean (i.e., when “varying” does not mean “random” in the sense of interest being in the variation). Unfortunately, this distinction is not pointed out often enough –so that people who have learned about random factors but not the distinction between this and varying effects are likely to confuse the two (as I did when I first encountered varying effects).

I am a PhD grad student in STEM Education in my first year and am taking a multivariate statistics course. It is very cook bookish, and firmly in the frequentist model. If I wanted to expand into the Bayesian models more, do you have a post that outlines some great resources to get me started & beyond? The deficits in the frequentist model are very obvious to me (I teach AP Stats at the high school, and am very comfortable with introductory stats and the shortcomings of that topic).

It seems that the shortcomings of univariate stats are just compounded at the multivariate stats. At least, that is what I have been concluding as I read your blog for the last year. Perhaps I am wrong or misunderstanding.

Thank you!

I think of “fixed effects” in the econometric sense as “differencing” models that identify on changes within some group. I think of “random effects” in the econometric sense as standard error adjustments for correlated error terms within group. But in the small applied-micro causal inference tradition in which I work, the major question we are asking of a regression model is “what is the variation in the world that identifies the coefficient of interest”.

I was wondering if anyone could suggest an R package in which I can specify Multiple Membership in a multi-level setting. Is the lmer function capable of this?

Dylan:

You’re in luck! You can fit such models easily in Stan and run them from R using rstan. No need to try to hack lmer to do it.