Pierre Jacob, Lawrence Murray, Chris Holmes, Christian Robert write:

In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference, prediction, or decision problem. In such circumstances, it is convenient to use a graphical model to represent the statistical dependencies, via a set of connected “modules”, each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the estimate and update of others, often in unpredictable ways. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of approaches that restrict the information propagation between modules, for example by restricting propagation to only particular directions along the edges of the graph. In this article, we investigate why these modular approaches might be preferable to the full model in misspecified settings. We propose principled criteria to choose between modular and full-model approaches. The question arises in many applied settings, including large stochastic dynamical systems, meta-analysis, epidemiological models, air pollution models, pharmacokinetics-pharmacodynamics, and causal inference with propensity scores.

The key idea is that full joint inference can lead to problems, and modular inference is a possible short-term solution, to the extent that the full joint model is misspecified. This suggests that it can make sense to try the computations both ways—joint and modular—and examine differences in the resulting inferences.

To put it another way, if we think of the joint model as a baseline, then we can think of a modular fit as a sort of model check.

Ultimately I think the best way forward is to expand the joint model so that it fits both datasets and makes substantive sense—see here for an example—but it also seems like a good idea to have tools for revealing model misspecifications that are relevant to particular inferences of interest, hence our interest in the above article.

And intertwined with the previous post?

For instance with a t.test there is a choice as whether to fully pool the variances of the two groups or _cut_ to make separate variance estimates. Or in a regression analysis where there is an interaction of slopes with gender there is a choice again (often over looked) fully pool the residual variances to make separate estimates by gender.

Now in my experience, any attempts formalize or automate such choices has run into serious problems in applications (since Bancroft’s 1940s? paper).

Actually figure 4 here was an artistic depiction of the regression analysis example.

Argh here https://andrewgelman.com/wp-content/uploads/2011/05/figure13.pdf

Thanks for linking to the paper. I’m working on it during the summer so comments are very welcome.

The main contribution (in my view) is to open the discussion on modular inference, by connecting different parts of the literature (see Section 1.2) and to provide diagnostics, i.e. something computable, to help choose between different approaches. The previous literature contains insightful but more informal discussions, except for some toy examples that have been analytically worked out, e.g. in the Liu, Bayarri & Berger 2009 paper.

Hm, JAGS has a cut function that is used to support modular inference. Stan doesn’t have a cut function, does it?

Actually, implementing the cut distribution is difficult as explained in Martyn Plummer’s “Cuts in Bayesian graphical models” paper. That paper is very pedagogical, explains the intractability of the feedback term that appears in the pdf of the cut distribution, and provides a partial solution, i.e. an approximate sampler. To the best of my knowledge, there is no generic MCMC algorithm to sample from the cut distribution.

In separate work on unbiased estimation (https://arxiv.org/abs/1708.03625), my co-authors and I propose a two-step unbiased estimation procedure to approximate integrals with respect to the cut without bias, and we use the same illustrative example as in Martyn Plummer’s paper.