One of my students forwarded your blog, and I think you’ve got it
wrong on this topic. More data does not always help and this has been
shown in numerous applications — thus the huge lit on the topic.
Analytically, the reason is simple. Just for an example, assume your loss
function is MSE; then, the uniquely best estimator is E(Y | x) — i.e.,
the conditional mean of Y at each point X. The reason one cannot do this
in practice is that as the size of your parameter space increases, you
never have enough data to span the space. Even if you change the above to
a neighborhood around each x, the volume of this hypercube gets really,
really ugly for any value of the neighborhood parameter. The only way out
of of this is to make arbitrary restrictions on functional form, etc. or
derive a feature space (thus “tossing out” data, in a sense).
As I said, there’s a huge number of applications where more is not better.
One example if face recognition –increasing granularity or pixel depth
doesn’t help. Instead, one must run counter to your intuition and throw
out most of the data by deriving a feature space. And, face recognition
still doesn’t work all that well, despite decades of research.
There’s a number of other issues — in your comments on 3 “good” i.v.’s
and 197 “bad” ones, you have to take the issue of overfitting much more
seriously than you do.
My reply: Ultimately, it comes down to the model. If the model is appropriate, then Bayesian inference should deal appropriately with the extra information. After all, discarding most of the information is itself a particular model, and one should be able to do better with shrinkage.
That said, the off-the-shelf models we use to analyze data can indeed choke when you throw too many variables at them. Least-squares is notorious that way, but even hierarchical Bayes isn’t so great when the large number of parameters have structure. I think that better models for interactions are out there for us to find (see here for some of my struggles; also see the work of Peter Hoff, Mark Handcock, and Adrian Raftery in sociology, or Yingnian Wu in image analysis). But they’re not all there yet. So, in the short term, yes, more dimensions can entail a struggle.
Regarding the problem with 200 predictors: my point is that I never have 200 unstructured predictors. If I have 200 predictors, there will be some substantive context that will allow me to model them.