Richard Hahn writes:

In some talk slides you recently posted you have the following bullet point: “Need to go beyond exchangeability to shrink batches of parameters in a reasonable way.” If you think other readers of the blog might find it interesting, I’d love to see you elaborate on this. While the whole talk is, of course, an elaboration, you do not elsewhere explicitly mention exchangeability. Isn’t the point of de Finetti-style theorems that exchangeability is precisely the “reasonable” assumption that leads to parametric models with nice conditional independence properties? Such results entail that we’re at liberty to make sophisticated, highly structured models based on conditional independence with the knowledge that a set of exchangeability judgments on observables lies back of them. Even very flexible, fancy DP-based Bayesian nonparametric models are based on notions of exchangeable random partitions. I’m probably just misreading you, but would be very interested in a clarification about what exactly you mean. If not, at root, exchangeability, then what else exactly is driving the batch shrinkage and how is it not ad hoc?

My quick reply: Consider a two-way data structure modeled as y_ij = a_i + b_j + c_ij, with no other information on the rows, the columns, or the individual cells. Then you have no choice but to model the a_i’s and b_j’s exchangeably. But the c_ij’s can be modeled conditional on the a_j’s and b_j’s–that is, these latent parameters can be considered as group-level predictors. The model is still exchangeable on the i’s and the j’s, but not on the (ij)’s. This is sometimes called “partial exchangeability.” More generally, one can consider three-way models, etc.

Sure, that makes sense. I took "move beyond exchangeability" to include partial exchangeability, since the idea of partial exchangeability has been around for quite some time. No doubt seeing the talk and/or being sharper would have helped. Thanks for answering my question.