Dean Eckles writes:
I make extensive use of random effects models in my academic and industry research, as they are very often appropriate.
However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors.
So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way.
Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for crossed random effects, which is why we have been content to just get reasonable
estimates uncertainty estimates for means, etc., when there are crossed random effects (http://arxiv.org/abs/1106.2125).
(Some extra details: In one case, I am fitting a propensity score model where there are really more than 2e8 somewhat similar treatments. Perhaps ignoring the ones that aren’t common (say, only considering treatments where I have over 1e4 treated units), I want to predict the treatment from a number of features. One approach is to go totally unpooled (your secret weapon), but I think variance will be a problem here sense there are so many features. Another approach is to use some other kind of shrinkage, like the lasso or the grouped lasso.)
Your ideas (or those of your blog readers) are appreciated!
I’ve been thinking about this problem for awhile. It seems likely to me that some Gibbs-like and EM-like solutions should be possible. (And if there’s an EM solution, there should be a variational Bayes solution too.) Here are the pieces, as I envision it:
– The two-stage analysis (separate model fir to each group, then a group-level regression of the group estimates) leads to Gibbs (or EM), in which the fitted group-level model (including the estimation uncertainty) is used as a prior and fed back into the individual-level analyses.
– Speeding things up by analyzing subsets of the data. The trick is to do the sampling where you have more data than necessary (e.g., “California”) but not in the sparser groups (“Rhode Island,” etc.).
– Ultimately getting all the data back in the model, once things have stabilized. This has a bit of the feel of particle filtering.
– My guess is that the way to go is to get this working for a particular problem of interest, then could think about how to implement it efficiently in Stan etc.