Jökull Snæbjarnarson writes . . .

Wow! After that name, anything that follows will be a letdown. But we’ll answer his or her question anyway.

So here goes. Jökull Snæbjarnarson writes:

I’m fitting large bayesian regression models in Stan where I have many parameters. Having fitted a model and some of the “beta” coefficients HDI’s, where beta is the beta in y \tilde beta * x, in regression model, include a zero.

Now to the question:
If the betas HDI’s include a zero should I consider them insignificant, and do something as annoying as stepwise regression, where I refit and remove parameters over and over again?

The refitting procedure is extremely expensive and is quite hard to accept. All in all I don’t like making discrete decisions in the fitting procedure and I was hoping you could give me some advice or point me towards some litterature on this matter. Pooling might make parameters more or less insignificant however it only helps you get closer to the discrete decision: should the beta coefficients be in the model or not?

My response:

I recommend using informative priors for the betas and then keeping them all in the model. If at some point you want to remove some predictors from your model, this should be for reasons of making your model simpler to understand, or to reduce the data requirements for a future model. Setting aside these sorts of reasons, I see no reason to exclude predictors. And if you do want to simplify your model, I recommend doing so by combining predictors (for example, combining several similar predictors by taking their average) rather than simply excluding things.

8 thoughts on “Jökull Snæbjarnarson writes . . .

  1. Andrew
    Surely you don’t mean that all predictors should be included? Many times, there are so many potential predictors that it makes no sense to include them all. What I imagine you mean is that there is some theoretical model you have in mind that involves using some subset of the predictors (along with informative priors for these predictors). Once the initial decision has been made about what to include in the model, I can understand your recommendation to keep those predictors in the model, regardless of statistical significance (or lack thereof). But I don’t understand it if you mean that all predictors should be included. In fact, this is what limits the usefulness of stepwise procedures (in my mind) – if applied in the absence of a theoretical model to begin with.

    Can you please clarify whether you are advising that all potential predictors should be included or how the initial choice of predictors should be made?

    • Dale:

      Including only some predictors is a special case in which priors for certain coefficients are set to delta functions at 0. This can make sense for computational reasons, but from an inferential standpoint I think it’s better to have a strong prior concentrated near 0, but not a pure delta function.

      • Thanks for all your great work. So,you recommend keeping all variables in a model. But this could be houndreds of variables, with many of them correlating highly. Also correlating highly with the variables you really are interested in finding the effects of. So we should not worry about multicorrelation, autocorrelation and overfitting? Just keep all variables. It is important in my models to be able to predict the future effects of some specific variables (which typically correlate highly with each other), and to a less degree the effects of other variables used in the model. I can imagine a variable with coefficient close to zero, suddenly having a very high (future expected) peak and then effecting my predicting model, which it should not. That would give me wrong predictions. Do you still not see any problems in keeping “insignificant” variables in a model?

        • Steen:

          If you have hundreds of variables, I don’t recommend putting them all in a big exchangeable pile. I recommend putting them together into scores, factors, whatever you want to call them. I recommend some structure. That makes more sense to me than keeping some variables in and leaving some out.

          You can set some coefficients exactly to zero if you’d like. I do it all the time; it’s the practical choice. But I don’t think it’s the right choice; it’s a shortcut so I can avoid having to think hard about all these interactions etc. Al Smith famously said that the cure for the ills of democracy is more democracy. Similarly, I think the cure for the ills of modeling is more modeling. Computational costs aside, I’d recommend partially pooling coefficients toward zero.

          The real gains, though, are in partially pooling whatever coefficients you do finally keep in your model. If you decide ahead of time to use something like crude least squares, then, yeah, you’ll have major problems with overfitting, correlation, etc., if you keep too many predictors in your model. But if you plan some partial pooling right from the beginning, you should be in better shape.

  2. I’ve never really understood why models with fewer predictors are simpler to understand than models with more predictors. If we use a data-dependent procedure to decide what is included in the model, this is, as you point out, the same as estimating those coefficients to be zeros. The model is still giving you the estimate of each effect, conditional on all other predictors–it is not as if you stop conditioning on them once you estimate them to be zero. And if the concern is about making an intelligible table or figure summarizing the results, isn’t it easy enough to just display effects larger (in some standardized form) than some meaningful lower bound, while noting that the model also conditions on various other things with negligible effects? Models with fewer predictors may be simpler to *use* to be sure, but easier to interpret is I think not really the case.

Leave a Reply

Your email address will not be published. Required fields are marked *