The phrase “curse of dimensionality” has many meanings (with 18800 references, it loses to “bayesian statistics” in a googlefight, but by less than a factor of 3). In numerical analysis it refers to the difficulty of performing high-dimensional numerical integrals.
But I am bothered when people apply the phrase “curse of dimensionality” to statistical inference.
In statistics, “curse of dimensionality” is often used to refer to the difficulty of fitting a model when many possible predictors are available. But this expression bothers me, because more predictors is more data, and it should not be a “curse” to have more data. Maybe in practice it’s a curse to have more data (just as, in practice, giving people too much good food can make them fat), but “curse” seems a little strong.
With multilevel modeling, there is no curse of dimensionality. When many measurements are taken on each observation, these measurements can themselves be grouped. Having more measurements in a group gives us more data to estimate group-level parameters (such as the standard deviation of the group effects and also coefficients for group-level predictors, if available).
In all the realistic “curse of dimensionality” problems I’ve seen, the dimensions–the predictors–have a structure. The data don’t sit in an abstract K-dimensional space; they are units with K measurements that have names, orderings, etc.
For example, Marina gave us an example in the seminar the other day where the predictors were the values of a spectrum at 100 different wavelengths. The 100 wavelengths are ordered. Certainly it is better to have 100 than 50, and it would be better to have 50 than 10. (This is not a criticism of Marina’s method, I’m just using it as a handy example.)
For an analogous problem: 20 years ago in Bayesian statistics, there was a lot of struggle to develop noninformative prior distributions for highly multivariate problems. Eventually this line of research dwindled because people realized that when many variables are floating around, they will be modeled hierarchically, so that the burden of noninformativity shifts to the far less numerous hyperparameters. And, in fact, when the number of variables in a a group is larger, these hyperparameters are easier to estimate.
I’m not saying the problem is trivial or even easy; there’s a lot of work to be done to spend this blessing wisely.