I’m not one to go around having philosophical arguments about whether the parameters in statistical models are fixed constants or random variables. I tend to do Bayesian rather than frequentist analyses for practical reasons: It’s often much easier to fit complicated models using Bayesian methods than using frequentist methods. This was the case with a model I recently used as part of an analysis for a clinical trial. The details aren’t really important, but basically I was fitting a hierarchcal, nonlinear regression model that would be used to impute missing blood measurements for people who dropped out of the trial. Because the analysis was for an FDA submission, it might have been preferable to do a frequentist analysis; however, this was one of those cases where fitting the model was much easier to do Bayesianly. The compromise was to fit a Bayesian model with a vague prior distribution.
Sounded easy enough, until I noticed that making small changes in the parameters of what I thought (read: hoped) was a vague prior distribution resulted in substantial changes in the resulting posterior distribution. When using proper prior distributions (which there are all kinds of good reasons to do), even if the prior variance is really large there’s a chance that the prior density is decreasing exponentially in a region of high likelihood, resulting in parameter estimates based more on the prior distribution than on the data. Our attempt to fix this potential problem (it’s not necessarily a problem if you really believe your prior distribution, but sometimes you don’t) is to perform preliminary analyses to estimate where the mass of the likelihood is. A vague prior distribution is then one that is centered near the likelihood with much larger spread.
We estimate the location and spread of the likelihood by capitalizing on the fact that the posterior mean and variance are a combination of the prior mean and variance and the “likelihood” mean and variance. Consider the model for multivariate normal data with known covariance matrix, and a multivariate normal prior distribution on the mean vector:
y| μ, Σ ~ N(μ, Σ)
μ ~ N(μ0, Δ0).
The posterior distribution of μ (where n represents the number of observations) is:
μ|y, Σ ~ N(μn, Δn), where
μn = (Δ0-1 + nΣ-1)-1(Δ0-1μ0 +nΣ-1y)
Δn-1 = Δ0-1 + nΣ-1.
Here y and Σ/n represent what I’m calling the likelihood mean and variance of μ. If we were unable to calculate them directly, we could do so by solving the above two equations for y and Σ/n, obtaining
Σ/n = (Δn-1 – Δ0-1)-1
y = (Σ/n) Δ0-1 (μn – μ0 ) + μn.
A vague prior distribution for μ could then be something like N(y, Σ) or N(y, 20Σ/n). For more complicated models you could do the same thing. Let (μ0, Δ0) and (μn, Δn) represent the prior and posterior mean vector and covariance matrix of the model hyperparameters. First fit the model (with multivariate normal prior distribution on the hyperparameters) for any convenient choice of (μ0, Δ0), then use the equations above to estimate the location and spread of the likelihood for these parameters. This approximation relies on approximate normality of the hyperparameters. In large samples this should be true; in smaller samples transformations of parameters can make the normal approximation more accurate.
It’s also possible to check the accuracy of the likelihood approximation: Fit the model again using the estimated likelihood mean and variance as the prior mean and variance. The resulting posterior mean should approximately equal the prior mean, and the posterior variance should be about half the prior variance, if the likelihood approximation is good. If not, the process can be iterated: fit the model, estimate the likelihood mean and variance, use these as the prior mean and variance, fit the model again and compare the prior and posterior means and variances. Repeat until the prior and posterior means are approximately equal and the posterior variance is about half the prior variance. From here, a vague prior can be obtained by setting the prior mean to the estimated likelihood mean the prior variance equal to the estimated likelihood variance scaled by some large constant.
I’ve tried this method in some simulations and it seems to work, in the sense that after iterating the above procedure a few times you do obtain an estimated likelihood mean and variance that, when used as the prior mean and variance, lead to a posterior distribution with the same mean and half the variance. With simple or well-understood models, there are surely better ways than this to come up with a vague prior distribution, but in complex models this method could be a helpful last resort.