Nelson Villoria writes:

I find the multilevel approach very useful for a problem I am dealing with, and I was wondering whether you could point me to some references about poolability tests for multilevel models. I am working with time series of cross sectional data and I want to test whether the data supports cross sectional and/or time pooling. In a standard panel data setting I do this with Chow tests and/or CUSUM. Are these ideas directly transferable to the multilevel setting?

My reply: I think you should do partial pooling. Once the question arises, just do it. Other models are just special cases. I don’t see the need for any test.

That said, if you do a group-level model, you need to consider including group-level averages of individual predictors (see here). And if the number of groups is small, there can be real gains from using an informative prior distribution on the hierarchical variance parameters. This is something that Jennifer and I do not discuss in our book, unfortunately.

Off-topic comment, but I sooo do wish your correspondents actually posted some meaty (gory?) details of their problems. Someone like me is lost in the jargon anyways, but at least if I could see the application & the model I might understand some 5% of it. :)

Be careful what you wish for, Rahul. One problem I always face when talking to a statistician is that (a) they don’t have domain knowledge in my field; and (b) one can’t have an elaborate preamble to explain the issue to them. One has to strip the problem down to some abstract level just to communicate it easily. Unfortunately, a lot of the action depends on domain knowledge. So what one ideally needs is a domain-expert statistician…

I don’t know. On the coding & engineering forums I frequent posting abstractions usually gets you rebuked. I always feel a concrete model never hurts; i.e. code, numbers, equations, error messages etc. . You can always supplement it with your abstract explanation if you feel like it.

PS. What’s wrong with your approach (b)? That sounded reasonable.

My experience has been that statisticians without domain knowledge will either not read the detailed preamble carefully, or won’t understand it. On the Stan mailing list, where one can get technical, I release all data and code so that the analysis is replicable (maybe that is what you were asking for—that makes sense), but no statistician without domain knowledge can tell me (and I don’t blame them) whether I even fit the right model or did all my preprocessing steps correctly (many errors start there). With libraries like lme4 it is easy to fit models that are pure garbage, and it’s now super-easy to translate these models into Stan and fit them there as well! Perhaps it’s the fault of the researcher: When one’s ignorance has no bounds, one doesn’t even know what to ask. I don’t blame the researcher either: why should they quit doing their work and devote years to understanding all this sophisticated stuff? They’ll be out of a job (although Entsophy would open a champagne bottle if that happens :).

Shravan.

Perhaps similar to my experience of trying to get input from experts that _don’t work for you_ you need something very short with very little burden other than pointing out their knowledge/interests.

Also the question of any kind of pooling needs to be based on domain knowledge so that one does not pool apples and oranges as its often put (e.g. RCTs and observational studies that being sweet and sour apples make a disgusting apple sauce)

Indeed. Why should an expert spend time on one’s problem without any reimbursement. Completely reasonable. That’s why I started to study statistics myself; there’s no sense in relying on free help.

As Geof Hinton once said “you have two choices, learn statistics or make friends with a statistician. I won’t comment on which is harder.”

@K’ O’Rourke:

My experience is that experts will often help you ( gratis too), *provided* your problem catches their interest. e.g. much of the open source support ecosystem works on just that. And for that the one asking must do his homework.

I’ve no interest in spending 10 minutes of my time helping someone save 5. But if investing that time will save someone many hours of effort, I’m more likely to help. Especially if the guy shows some sunk cost.

Part of the history of meta-analysis of randomised clinical trials revolved around whether one should formally test for fixed (complete pooling) versus random effects (partial pooling).

Eventually most folks realised that it was bad idea.

But it’s likely very tempting to think the data can tell (for certain) which model to use, so this history likely repeats itself in different fields.

what was a bad idea? what’s “it”?

The bad idea is “formally test for fixed (complete pooling) versus random effects (partial pooling)”.

A special case of ” Once the question [of random effects] arises, just do it [partial pooling].”

Quite a interesting point. Can you give me some references about debate of having to formally test for fixed versus random effects and the conclusion they got eventually?