Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham:
I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"].
My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here‘s an old blog post on the topic. And also of course there’s the description in our book.
Chris pointed me to the following comment by Simon Barthelmé:
Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the standard deviations and the correlations separately, so that you can express things like “I don’t expect my coefficients to be too large but I expect them to be correlated.”
Barthelmé mentions the Barnard, McCulloch, and Meng paper (which I just love, and which I cite in at least one of my books) in which the scale parameters and the correlations are modeled independently, and writes, “I don’t see why this isn’t the default in most statistical software, honestly.”
The answer to this last question is that computation is really slow with that model.
Also, it’s not really necessary for scale parameters and correlations to be precisely independent. What you want is for these parameters to be uncoupled or to be de facto independent. To put it another way, what matters in a prior is not what the prior looks like, what matters is what the posterior looks like. We’d like to be able to estimate, from hierarchical data, the scale parameters and also the correlations. The redundant parameterization in the scaled inverse Wishart prior (which, just to remind you, is due to O’Malley and Zaslavsky, not me; all I’ve done is to publicize it) allows scale parameters and correlations to both be estimated from data. It fixes the problem with the unscaled inverse-Wishart.
There’s nothing so wonderful about the Wishart or inverse-Wishart in any given example. These are all just models. What I like about these models are that they are computationally convenient, and the scaled version allows the flexibility we want for a hierarchical model. The Barnard et al. model is fine too (and, as I said, I love their article) but I don’t see any particular reason why these parameters should be independent in the prior. That’s just another choice too. What matters is how things get estimated in the posterior.
Unfortunately, even now I think the inverse-Wishart is considered the standard, and people don’t always know about the scaled inverse-Wishart. Another problem is that people often think of these models in terms of how they work with direct multivariate data, but I’m more interested in the hierarchical modeling context where a set of parameters (for example, regression coefficients) vary by group.