Skip to content

Combining Bayesian inferences from many fitted models

Renato Frey writes:

I’m curious about your opinion on combining multi-model inference techniques with rstanarm:

On the one hand, screening all (theoretically meaningful) model specifications and fully reporting them seems to make a lot of sense to me — in line with the idea of transparent reporting, your idea of the multiverse analysis, or akin to Simonsohn’s Specification Curve Analysis (which seem to be quite closely related to each other anyway). So in the past, I’ve been using tools such as glmulti or regressionBF (from the BayesFactor package) to obtain model-averaged coefficients and more interestingly, the “importances” of the various predictors, which are determined across various model specifications (i.e., in the case of glmulti). However, these tools are only available for the “traditional” OLS and related estimation methods.

On the other hand, I’ve recently started to use rstanarm, which I now clearly prefer to the traditional estimation methods, not least because of the possibility to specify weakly informative priors and the resulting regularization.

As different model specifications would make sense from a theoretical point of view in some of my current projects, I’ve now wondered if it would be reasonable to write a wrapper that automatically implements different model specifications and runs them with rstanarm, which would permit an (automatic) comparison of coefficients across different model specifications (and would possibly also permit to extract a measure of “importance” for each of the predictors). Of course, this would need to be highly parallelizable to make it computationally feasible.

Do you have any thoughts on this, and / or do you plan to implement something related in rstarnarm in the future?

My reply:

Yes, definitely use rstanarm! Or go straight to Stan (that would be rstan if you’re running it from R) and program more general models. If you want to fit several models and average their posterior distributions, I recommend stacking, as described in this recent paper. Also, sure, someone could write a wrapper to automatically fit a large number of models in rstanarm in parallel and then average over them—this would not be hard to do—but I think I’d prefer fitting a single model with all these predictors and interactions, using strong priors to regularize all the coefficients.


  1. Belay says:

    Implementing Bayesian variable selection might be the way forward. But writing a wrapper function following the Burnham and Anderson,2002 approach is also possible. I did this using JAGS and hope will not be difficult to do it also in rstan. The wrapper function should fit all models, extract DIC for instance then compute DIC weight then to get model average estimate just weight each parameter estimate with their corresponding DIC weight.

  2. Aki Vehtari says:

    For predictor relevance and selection see my StanCon talks and notebooks at Main points are
    – in case of predictors no need to do model combination, just include all potentially relevant predictors
    – regularized horseshoe prior is good if you have lot of predictors
    – projection predictive variable selection is better than any approach to choose a smaller set of predictors
    – rstanarm + projpred package make all this easy (projpred at
    – you can use projpred also with rstan
    – all this is computationally feasible with laptop at least up to 10000 predictors

  3. Richard McElreath says:

    I’m sure this is an obvious caveat, but let’s be careful about encouraging people to dump all predictors into a common model. Colliders do exist, for example.

    • Aki Vehtari says:

      Sure. Question about terminology: If you know a variable is collider, would you still call it also predictor?

      • Ben Goodrich says:

        I would say yes. A variable / node can be a collider on one path but not a collider on another path. So, it is possible there is a unblocked path from a variable / node to the outcome, in which case it is a predictor in the usual statistical sense, but at the same time, it can be a collider on another path in the DAG sense if you condition on it.

        On the other hand, the utility function for projpred or stacking is being defined solely in terms of predictive ability, rather than being able to interpret an estimate in causal terms. So, if you decide to go that route, I think you have already set aside your DAG.

      • Corey says:

        Yeah it’s a predictor. The right caveat for Richard’s concern is something like, “Be aware of the difference between predictive inference and causal inference. If no intervention into the system is planned or possible and the aim is simply to predict as accurately as possible then it’s appropriate to use all available predictors but if the investigation aims to tease out causal connections then you need to avoid conditioning on colliders (for Neyman-Rubinites, post-treatment variables).”

      • Jack says:

        Of course it’s a predictor — predictor is what predicts.

Leave a Reply