Constructing a scaled-inverse-Wishart prior distribution that is informative on the variance parameters

Just in case you thought this blog was all fluffy political stuff . . . Kaisey Mandel writes:

Last spring in class you mentioned using scaled inverse wishart priors for multivariate normal covariance matrices. I read about it in Gelman & Hill. I can see why it is useful since if

y_i ~ N( mu, Sigma)

Sigma = Diag(xi) Q Diag(xi)

Q ~ Inv-Wish_{K+1}(I)

and P(mu) –> flat

then because of conjugacy P( Q | xi, y) can be sampled directly and implies a uniform prior on each correlation.

I am trying to model a potentially large covariance matrix, and I like those properties of the scaled-inverse-wishart. However, I also want to place priors on the individual variances sigma^2_k = xi^2_k Q_kk, with the flexibility to choose different priors for different components. I can see how putting individual priors on the xi_k preserves the conjugacy in Q. But that’s not the same as a prior on the individual sigma^2_k. I was wondering if you knew of a way to preserve the good properties of the scaled-inverse-wishart (useful for Gibbs sampling!) while expressing priors on the sigma^2_k, which depend on Q.

My reply: I think the way to do it is to put priors on the xi_k’s. I realize that you feel that you want the prior directly on the sigma_k’s, but my guess is that putting the info on the xi’s will be fine. Q won’t be that far from uniform. If you really want, you can do some simulation so as to set priors on the xi’s in a way that will give you the desired priors on the sigma’s–see this paper of mine from 1995 for how to do it–but my guess is that this won’t even be necessary.

4 thoughts on “Constructing a scaled-inverse-Wishart prior distribution that is informative on the variance parameters

  1. Just one quick thought. What if we change the representation from Q to C.Q, that is

    Sigma = Diag(xi) C.Q Diag(xi),

    where C.Q is the correlation matrix of Q. Two nice features from the modeling side. 1, the prior on sigma_i is exactly the same as xi_i; 2, in case we need a weak prior on the correlation but strong on the variance, this is manageable, while the DQD representation always has a weaker prior variance if D has some randomness.

    I haven't thought carefully about conjugacy, but I guess it shouldn't cause too much trouble with the right augmentation.

Comments are closed.