## The blessing of dimensionality

The phrase “curse of dimensionality” has many meanings (with 18800 references, it loses to “bayesian statistics” in a googlefight, but by less than a factor of 3). In numerical analysis it refers to the difficulty of performing high-dimensional numerical integrals.

But I am bothered when people apply the phrase “curse of dimensionality” to statistical inference.

In statistics, “curse of dimensionality” is often used to refer to the difficulty of fitting a model when many possible predictors are available. But this expression bothers me, because more predictors is more data, and it should not be a “curse” to have more data. Maybe in practice it’s a curse to have more data (just as, in practice, giving people too much good food can make them fat), but “curse” seems a little strong.

With multilevel modeling, there is no curse of dimensionality. When many measurements are taken on each observation, these measurements can themselves be grouped. Having more measurements in a group gives us more data to estimate group-level parameters (such as the standard deviation of the group effects and also coefficients for group-level predictors, if available).

In all the realistic “curse of dimensionality” problems I’ve seen, the dimensions–the predictors–have a structure. The data don’t sit in an abstract K-dimensional space; they are units with K measurements that have names, orderings, etc.

For example, Marina gave us an example in the seminar the other day where the predictors were the values of a spectrum at 100 different wavelengths. The 100 wavelengths are ordered. Certainly it is better to have 100 than 50, and it would be better to have 50 than 10. (This is not a criticism of Marina’s method, I’m just using it as a handy example.)

For an analogous problem: 20 years ago in Bayesian statistics, there was a lot of struggle to develop noninformative prior distributions for highly multivariate problems. Eventually this line of research dwindled because people realized that when many variables are floating around, they will be modeled hierarchically, so that the burden of noninformativity shifts to the far less numerous hyperparameters. And, in fact, when the number of variables in a a group is larger, these hyperparameters are easier to estimate.

I’m not saying the problem is trivial or even easy; there’s a lot of work to be done to spend this blessing wisely.

### One Comment

1. Sam Cook says:

Andrew commented:

To put it another way, here is the statistical setting where dimensionality is indeed a "curse":

Suppose you're in a regression setting with, say, 100 data points and 3 meaningful predictors. All is well. Someone now gives you 197 additional predictors, which are mostly noise, and mixes them in with the original 3 so that you don't know what is what. You're now in a more difficult situation and might well prefer to be back with the 3 original predictors.

I think this is what people have in mind when they talk about the curse of dimensionality in regression problems.

But I think this scenario is inappropriate! Its key logical flaw is the use of the "3 good predictors" as a comparison point. If you had 3 good predictors, and 197 others, you can keep the 3 good ones and use the 197 as you deem appropriate, perhaps combining them into one or two summary scores or shrinking them toward zero or whatever. Conversely, if you really have 200 predictors and you have no prior knowledge about which are good and which are useless or close-to-useless, then having only 3 of them will leave you in even worse shape. The "3 good predictors" scenario actually encodes some extremely important information, which is that these particular 3, out of the 200, are good.