Comparing the full model to the partial model

Pat Lawlor writes:

We are writing with a question about model comparison and fitting. We work in a group at Northwestern that does neural data analysis and modeling, and often would like to compare full models (e.g. neurons care about movement and vision) with various partial models (e.g. they only care about movement). We often use cross-validated likelihood to select between models. But there are two ways of setting up this comparison and the difference is somewhat subtle. (1) We can compare the full model with the regular partial model. (2) We can compare the full model with the full model where we instead randomly shuffle the data in the potentially irrelevant domain. E.g. if we had two regressors, say movement and vision, we would compare a model with movement and vision with a model of movement with a shuffled vision component.

The two approaches seem to have different advantages.
When using (1) there will be a bias towards simpler models as simpler models will overfit less. However, in some cases we may want to have this bias.
When using (2) then both the full and the effectively partial model have the same numbers of parameters and should usually receive the same biases from overfitting. We basically ask if a regressor is better than the same regressor shuffled. In some cases this approach would seem preferable as it does not introduce a bias favoring the simpler model.

Both procedures seem to be meaningful alternatives to AIC/BIC which we do not feel overly comfortable about in all situations.

So our question is if you think (2) could be used to ask: “should we reject the null hypothesis that the regressor helps?”. We also are curious about whether the properties of this approach are understood.

My reply: I can see the appeal of your approach 2 in certain situations as it provides a sort of sanity check for what you are doing. In applied statistics, especially in economics, one sometimes sees this sort of analysis under the name “placebo control” or something like that. But from general principles, approach 2 seems kinda weird in that it is gratuitously adding noise to the analysis without a clear benefit.

Approach 1 is more standard, but in some ways I think the real problem is probably with your formulation of the full model. In Bayesian terms, if your full model has a flat prior on its extra parameters, you’re comparing a flat prior with a prior that is a spike at 0. I’d guess you’d be better off with a moderate prior that allows some partial pooling toward zero. In your example, I’d guess that once you consider movement and vision as possible predictors, they both should be in the model, along with their interaction, but if you think there is evidence that the coefficients will likely be small, this should be included in your prior distribution.

Or, as Chris Hedges might say, we often use cross-validated likelihood to select between models. But there are two ways of setting up this comparison and the difference is somewhat subtle.

15 thoughts on “Comparing the full model to the partial model

      • Unlike the usual approach to hypothesis testing, there doesn’t seem to be any consideration of a sampling distribution of the test statistic under the null and a Type I error rate. Some (not I) would say that without those specific things, the relevant operating characteristics of the procedure are unknown.

        • My understanding of how these things work is that you construct an empirical distribution through repeated permutations and then see where in the cumulative distribution the observed data fall. If it’s in the rejection region, then you reject. Otherwise, you fail to reject.

          In that procedure wouldn’t the type 1 error rate always be 5%? I guess maybe you could make some arguments that doing a permutation rather than a bootstrap would lead to some slow convergence, but still, 5%, right?

        • As far as I know, you’re correct. It’s just that there are other things they might be doing, and the description of the procedure doesn’t rule out those other things. In particular, the mention of “cross-validated likelihood” means that they have one out-of-sample predictive likelihood value per hold-out subset under both the full model and the permuted-data model,and they could be doing all manner of wacky things with these two sets of values.

  1. Can this be clarified:
    “neurons care about movement and vision” vs “they only care about movement”

    It seems like detecting movement would be subset of vision. Is this the movement of the animal being recorded?

    • I can’t speak for Lawlor et al., but I did survey courses on neural processing for vision in grad school, so I have some plausible guesses. I expect that the movement in question isn’t movement in the visual field, but rather the movement of the animal doing the seeing. The vestibular system gives feedback about the acceleration (hence velocity) of the head, which feeds into the visual system so that eyes can do gaze-fixing and motion-tracking. I’d guess it also feeds into processing of visual input so that the relative positions of the animal and objects around it can be estimated. So neurons that “just care about movement” probably means neurons involved in proprioception and/or the vestibular system; there won’t be any neurons that just care about vision because info about movement is part of the input to the visual system.

      • Corey,

        Thanks, I suspect that is what they mean also. I wonder what evidence they have for neurons that do not respond to the movement of the animal at all. We do not know from the description which neurons they are referring to, but if in the cortex I would a priori expect neurons responsible for processing visual information to also be receiving input regarding the animal’s motion at the time.

        • On the other hand, movement is informed by visual input as well. So I don’t see why the same argument couldn’t be made in reverse.

        • Motor signals that cause movement are informed by visual input. This is different from signals from afferent neurons in the vestibular system, which are not subject to low-level feedback from the visual system. As far as I know, proprioception is quite separate from vision — that’s how you can sense where your limbs are even with your eyes closed.

        • Corey,

          But if they are comparing two sets of cortical neurons then both are probably receiving input (even if indirectly) from all senses. If they are comparing afferent neurons from the vestibular system with those of the visual system then I would expect neither to respond to both types of input. I have no idea what the evidence is for any of this though.

  2. Re: original request: To some extent this depends on what your aim of model selection is. Actually whatever you could do should not be interpreted as “reject the null hypothesis that the regressor helps”, as you write, because the regressor may help a tiny little bit, so little that you’d need 1000 times as many observations as you have in order to be sure. Usually more complex models can fit the data as well as complex reality better indeed, but the key problem of model selection is: “How much better a fit do you need so that it’s worth the effort?” Depending on what your preferences are, the answer can be different (if you value simplicity more than a small increase in prediction quality, you will often end up with a different, simpler model).

    Your approach (1) basically addresses the question whether or not you have clear evidence that the regressor is needed. Approach (2) would do the same in a rather nonparametric fashion (which seems fine to me) if you’d run this as a proper permutation test including the simulation of a p-value. If you just choose the model that looks better on average, this indeed seems to be nicer to the more complex model (it just has to beat the simpler one, but not at 5% or 1% significance), which will make you choose it more often. Probably good for prediction quality (but maybe only very weakly so), but bad for simplicity, and certainly at the end you can’t claim that any model is “rejected”.

  3. Hi all,
    Thanks for the comments and discussion.

    Regarding the approach: We suspected that this was probably a relatively common procedure, or at least a variant of one. It’s not something we have used so far – just an idea we’ve been kicking around. I know the original post didn’t have a lot of detail, but I was envisioning something along the lines of what Brad Spahn described above. I.e., fit a bunch of the permuted partial models and ask whether the full and permuted partial models’ distributions are significantly different. Any suggestions about refining/formalizing the approach are more than welcome, although it seems like the consensus is that approach 2 doesn’t really add a lot over the more standard approach 1.

    A little more background on our models: Typically we’re interested in asking what single neurons “represent” or “do” or “relate to”. We sometimes use Poisson regression to model a discrete spike train (spikes/time bin) as a function of things you can experimentally measure – limb movement direction, visual features of the environment, etc. So each data point is the number of spikes in a time bin, as well as the values of any experimental quantities also in that time bin (these aren’t independent points, so you have to be careful about choosing cross-validation folds). We then use cross-validation to estimate the model’s predictive power, usually with something like pseudo-R2. We usually bootstrap over test points to get confidence intervals, but ideally we’d do something like re-fit bootstrapped data samples (too computationally demanding for most of our models).

    Regarding the neuroscience comments: the movement/vision example from the original post was a bit unclear, sorry. We do some work with a brain region that is thought to relate to eye movements and/or visual features of the environment. Because we tend to make eye movements to interesting visual features, these two things are quite correlated and it’s hard to tell what the neuronal activity actually relates to. So we use this type of modeling to tease the two apart. Here’s a recent paper from our group on the subject: http://www.ncbi.nlm.nih.gov/pubmed/23863686

    Again, thanks for the comments.

    -Pat

    • Pat,

      I took a look at that paper and have another neuroscience question. Feel free to ignore if you are too busy/whatever.

      On page seven it says:
      “One of the well-established characteristics of many FEF neurons is that they are tuned to the direction of upcoming movements.”

      However, most of the firing shown by figure 3 is occurring during the saccade movement, rather than preceding it.

Leave a Reply

Your email address will not be published. Required fields are marked *