Brendan Rocks writes:
I have a request for a blog post.
I’ve been following the debates about ‘cut’ on the Stan lists over the last few years. Lots of very clever people agree that it’s bad news, which is enough to put me off. However, I’ve never fully groked the reasoning. [I think that should be “grooked”—ed.]
I’ve considered using it for sub-models which use priors to ‘tidy-up’ estimates, before passing them to a “different” model (while propagating uncertainty).
The explanations about why it’s bad news make sense in a general conceptual way (inference should flow both ways in Bayes). However, it’s difficult to think instinctively about how this might affect results, or what the pros and cons of avoiding cut are (beyond the conceptual benefit, and the computational cost).
Can you think of a cute example to illustrate how cut make might a ‘gotcha’ difference to a model?
Are there ever situations where you think it could be conceptually defensible?
My reply: From the modeling side, the problem of imperfect aggregation or transportability arises in many application areas, in particular pharmacology. The general setting is that a model is fit to a dataset y, giving inferences about parameter vector φ, and then it is desired to use φ to make inferences in a new situation with new data y′. The often-voiced concern is that the researcher does not want the model for y′ to “contaminate” inference for φ. There is a desire for the information to flow in one direction, from y to φ to predictions involving the new data, but not backward to φ. Such a restriction is encoded in the cut operator in the Bayesian software Bugs. We do not think this sort of “cutting” makes sense, but it arises from a genuine concern, which we prefer to express by modeling the parameter as varying by group. Hence we believe that the introduction of a shift parameter δ, with an informative prior, should be able to do all that is desired by the cut operator. Rather than a one-way flow of information, there is a two-way flow, with δ available to capture the differences between the two groups, so there is no longer a requirement that a single model fit both datasets.
Here’s an example from a recent paper with Sebastian and others.
In answer to Brendan’s specific question: Yes, I’ve seen examples where the model for y does not fit the data y′, and something needs to be done. In my experience, instead of “cutting,” it works better to expand the model until it fits both datasets.