Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own.
In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information.
Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there.
So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously.
Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things happen: data per parameter become more sparse, and priors distribution that are innocuous in low dimensions become strong and highly informative (sometimes in a bad way) in high dimensions.
Here are four examples of the dangers of noninformative priors:
1. From section 3 of my 1996 paper, Bayesian Model-Building by Pure Thought: estimating a convex, increasing function with a flat prior on the function values (subject to the constraints). It’s a disaster. As discussed in the article, the innocuous-seeming prior contains a huge amount of information as you increase the number of points at which the curve is estimated.
2. The classic 8-schools example: that is, any hierarchical model. A noninformative uniform prior on the coefficients is equivalent to a hierarchical N(0,tau^2) model with tau set to a very large value. This is a very strong prior distribution pulling the estimates apart, and the resulting estimates of individual coefficients are implausible.
3. Any setting where the prior information really is strong, so that if you assume a flat prior, you can get silly estimates simply from noise variation. For example, the claim that beautiful parents are more likely to have girls, which is based on data that are much much weaker than the prior information on this topic.
4. Finally, the simplest example yet, and my new favorite: we assign a flat noninformative prior to a continuous parameter theta. We now observe data, y ~ N(theta,1), and the observation is y=1. This is of course completely consistent with being pure noise, but the posterior probability is 84% that theta>0. I don’t believe that 84%. I think (in general) that it is too high.
None of these examples are meant to shoot down Bayes. Indeed, if posterior inferences don’t make sense, that’s another way of saying that we have external (prior) information that was not included in the model. (“Doesn’t make sense” implies some source of knowledge about which claims make sense and which don’t.) When things don’t make sense, it’s time to improve the model. Bayes is cool with that.
P.S. Much discussion in comments. The following bit might be helpful, regarding example 4 above:
Of course it depends on the context. Depending on the scaling of the problem, an effect of 100 could make sense. I try to scale things so that effects are of order of magnitude 1. For example, in logistic regression you’re not going to see an effect of 100, similarly in econ you’re not going to see an elasticity of 100 if you’re working on the log-log scale.
I wouldn’t frame this as “second-guessing someone’s prior.” A better way to put it would be that people use conventional models that include much less information than is actually known. Such conventional models include linear regressions etc. as well as uniform prior distributions. If data are strong, you can often do just fine with conventional models. But if data are sparse, it can often make sense to go back and add some real information to your model, in order to better answer your scientific questions.
To put it another way, an analysis based on a conventional model can (sometimes) tell you what’s in the data. But scientific reports typically don’t just report information in data, they also make general claims about the world, and for that it can be a terrible mistake to ignore strong information that is already known.