Including interactions or not

Liz Sanders writes:

I viewed your 2005 presentation “Interactions in multilevel models” and was hoping you or one of your students/colleagues could point me to some readings about the issue of using all possible vs. only particular interaction terms in regression models with continuous covariates (I think “functional form validity” is the term I have encountered in the past).

In particular, I am trying to understand whether I would be mis-specifying a model if I deleted two of its interaction terms (in favor of using only 2-way treatment interaction terms). The general full model, for example, is:

Y = intercept + txt + pre1 + pre2 + txt*pre1 + txt*pre2 + pre1*pre2 + txt*pre1*pre2, where txt is effect coded (1=treatment, -1=control) and pre1 and pre2 are two different pretests that are assumed normally distributed. (The model is actually a multilevel model; the error terms are not listed for brevity.)

The truncated model, on the other hand, would only test 2-way treatment interactions (deleting the last two terms). There are plenty of data, and the results indicate that the three-way interaction term is significant for two of six outcomes I modeled. On the one hand, I worry about model mis-specification if I delete the last two interactions. On the other hand, I worry about spurious ‘significant’ results with all of the terms in the model.

My reply:

The usual advice, which I think is reasonable here, is that your truncated model is ok. But I also think it would fine to include everything and then plot the estimated coefficients and the fitted model to understand what you’ve got.

1 thought on “Including interactions or not

  1. When the inputs are continuous, there is no single "interaction" term to include. It is unclear in your case whether there is reason to think that you should model the response as linear in the product of the two inputs (why not a ratio?). If you have enough data, often it seems much more reasonable to let the coefficient for one input vary with the the other, using local regression. Sometimes this is called a varying coefficient model (though this can lead to jargon confusion when multilevel models are also involved.)

Comments are closed.