Bayesian model selection

A researcher writes,

I have made use of the material in Ch. 6 of your Bayesian Data Analysis book to help select among candidate models for inference in risk analysis. In doing so, I have received some criticism from an anonymous reviewer that I don’t quite understand, and was wondering if you have perhaps run into this criticism. Here’s the setting. I have observable events occurring in time, and I need to choose between a homogeneous Poisson process, and a nonhomogeneous Poisson process, in which the rate is a function of time ( e.g., lognlinear model for the rate, which I’ll call lambda).

I could use DIC to select between a model with constant lambda and one where the log of lambda is a linear function of time. However, I decided to try to come up with an approach that would appeal to my frequentist friends, who are more familiar with a chi-square test against the null hypothesis of constant lambda. So, following your approach in Ch. 6, I had WinBUGS compute two posterior distributions. The first, which I call the observed chi-square, subtracts the posterior mean (mu[i] = lambda[i]*t[i]) from each observed value, square this, and divides by the mean. I then add all of these values up, getting a distribution for the total. I then do the same thing, but with draws from the posterior predictive distribution of X. I call this the replicated chi-square statistic.

If my putative model has good predictive validity, it seems that the observed and replicated distributions should have substantial overlap. I called this overlap (calculated with the step funtion in WinBUGS) a “Bayesian p-value.” The model with the larger p-value is a better fit, just like my frequentist friends are used to.

Now to the criticism. An anonymous reviewer suggests this approach is weakened by “using the observed data twice.” Well, yes, I do use the observed data to estimate the posterior distribution of mu, and then I use it again to calculate a statistic. However, I don’t see how this is a problem, in the sense that empirical Bayes is problematic to some because it uses the data first to estimate a prior distribution, then again to update that prior. I am also not interested in “degrees of freedom” in the usual sense associated with MLEs either.

I am tempted to just write this off as a confused reviewer, but I am not an expert in this area, so I thought I would see if I am missing something. I appreciate any light you can shed on this problem.

My thoughts:

1. My first thought is that the safest choice is the nonhomogeneous process since it includes the homogeneous as a special case (in which the variation in the rate has zero variance over time). This can be framed as a modeling problem in which the variance of the rate is an unknown parameter which must be nonnegative. If you have a particular parametric model (e.g., log(rate(t))=a+b*x(t)+epsilon(t), where epsilon(t) has mean 0 and sd sigma), then the homogeneous model is a special case like a=b=sigma=0.

2. From this perspective, I’d rather just estimate a,b,sigma and see to what extent the data are consistent with a=0, b=0, sigma=0.

3. I agree with you that “using the data twice” is not usually such a big deal. It’s a n vs. n-k sort of thing.

4. I’m not thrilled with the approach of picking the model with the larger p-value. There are lots of reasons this might not work so well. I’d prefer (a) fitting the larger model and taking a look at the inferences for a, b, and sigma; and maybe (b) fitting the smaller model and computing a chi-squared statistic to see if this model can be rejected from the data.