MCMC model selection question

Posted on January 2, 2010 12:31 PM by Andrew

Robert Feyerharm writes in with a question, then I give my response. Youall can play along at home by reading the question first and then guessing what I’ll say. . . .

I have a question regarding model selection via MCMC I’d like to run by you if you don’t mind.

One of the problems I face in my work involves finding best-fitting logistic regression models for public health data sets typically containing 10-20 variables (= 2^10 to 2^20 possible models). I’ve discovered several techniques for selecting variables and estimating beta parameters in the literature, for example the reversible jump MCMC.

However, RJMCMC works by selecting a subset of candidate variables at each point. I’m curious, as an alternative to trans-dimensional jumping would it be feasible to use MCMC to simultaneously select variables and beta values from among all of the variables in the parameter space (not just a subset) using the regression model’s AIC to determine whether to accept or reject the betas at each candidate point?

Using this approach, a variable would be dropped from the model if its beta parameter value settles sufficiently close to zero after N iterations (say, -.05 < βk < .05). There are a few issues with this approach: Since the AIC isn't a probability density, the Metropolis-Hastings algorithm could not be used here as far as I know. Also, AIC isn't a continuous function (it "jumps" to a lower/higher value when the number of model variables decreases/increases), and therefore a smoothing function is required in the vicinity of βk=0 to ensure that the MCMC algorithm properly converges. I've run a few simulations and this "backwards elimination" MCMC seems to work, albeit it converges to a solution very slowly. Anyways, if you have time I would greatly appreciate any input you may have. Am I rehashing an idea that has already been considered and rejected by MCMC experts?

My first comment is that if you have 10 input variables, you have a lot more than 2^10 possible models. Don’t forget that any of your variables can be interacted with any of the others. This leads me to suggest that you move away from the inclusion/exclusion framework toward more of an active modeling approach, where you put the inputs together in a way that makes sense. (Unless you’re working in a blind “machine learning” framework, in which case I suspect you should be exploring a richer model space, but I admit I’m not quite sure how to do that.)

Regarding your specific questions, I’m guessing that DIC would do some of what you’re looking for, if you use a (hierarchical) prior distribution that captures your idea that coefficients are likely to be close to zero. I’m not so happy with the dichotomous approach of either setting a coefficient to zero or estimating it without constraints; maybe your approach will work even better in a partial-pooling framework.

2 thoughts on “MCMC model selection question”

Bob O'H on January 2, 2010 1:39 PM at 1:39 pm said:

For something automatic, doesn't one want a prior that equates to an AIC loss function? Once one has that, there are several approaches that might work (shameless self-promotion). There are sparse methods which impose a strong prior on the variables (e.g. a Jeffrey's prior on the variance of each beta), which I think close to what Robert wants.
RobertF on January 4, 2010 6:06 AM at 6:06 am said:

Thank you for the helpful comments, gentlemen!

There are indeed more than 2^10 possible models for 10 variables if interactions are included. My aim was to first select main effect variables and then search for possible interactions among the chosen main effect terms.

When selecting candidate variables, we do only consider demographic, socioeconomic, and other health/behavioral factors which could be reasonably associated with the outcome variable and aren't correlated with other explanatory variables. However this often still leaves us with 10+ variables.

I'm not familiar with choosing a prior that equates to an AIC loss function, and would welcome any advice in this area. I used uniform and normal prior distributions for the betas in my simulations, but only out of convenience. I'll definitely look into the other options suggested here. Thanks for the link to your paper, Bob.

Robert

Comments are closed.