Well put, Rob Weiss.

This is not to say that one must always use an informative prior; oftentimes it can make sense to throw away some information for reasons of convenience. But it’s good to remember that, if you do use a noninformative prior, that you’re doing less than you could.

I understand why it should be informative, but why does it have to be proper?

probabilities are nice to work with?

Because you can sometimes get an improper posterior which is unusable. Sometimes difficult to know if this has happened so easier to keep things proper!

“A prior” seems too much like “a prejudice” to not be suspect under contemporary culture’s prejudice against prejudice. So Bayesianism faces a steep uphill climb. It’s battling against the spirit of the age: innocence through ignorance.

A prejudice is a prior not justifiable by fact.

+1

A prejudice is a prior not _contestable_ by fact.

Here’s some PR advice: Bayesianism needs a different piece of jargon that doesn’t begin with “pr” so that it doesn’t remind people of “prejudice.”

That might help to gain widespread attention amongst laypeople, but shouldn’t we focus on teaching people who use statistics (for decision making) on a regular basis?

People who want to improve their skills should not be deterred by the wording of terms (yes, I know thats a high ideal).

An informative prior can make either a positive or a negative contribution to the enterprise. If it reflects an “underlying reality,” the contribution can be positive. If its inconsistent with the underlying reality, the contribution can be negative.

David:

Indeed, and this is the case of statistical modeling more generally. Assume an additive model, or a linear regression, or a Poisson distribution, or independence, when these are not the case, and you can get trouble.

Given Box (and others, e.g. Maynard Keynes) have been pointing that out for a very long time with little to no impact, I am wondering if there is more to it (as Steve put it – culture’s prejudice against prejudice.)

Mike Evans has this take on it of there needing to be a principle of empiricism, that all ingredients used in a statistical analysis can and are checked against (brute force) experience. So subjective priors that are immune to testing should not be allowed. Or that the assumptions about functional form needed in structural inference that can’t be checked should rule it out as an acceptable approach.

> reflects an “underlying reality,”

That’s the challenge for any representation/model/sign/image/thought….

Keith:

I disagree that aspects of a model that are immune to testing from available information should not be allowed. I think they should be allowed—there is such a thing as prior information—but the user should be aware that these assumptions are not testable.

> immune to testing from available information should not be allowed.

I meant immune to testing from potential information not necessarily information in hand.

(Reality has to have an opportunity to slap us in the head if we are too wrong.)

+1

Why does Andrew say that it is sometimes convinient to throw away information that we posses? Can someone offer an example?

Rahul:

You ask for an example where it is convinient to throw away information that we possess.

There are lots of examples, as in, just about every analysis I’ve ever done. Just for example, in 1990, Gary King and I published an estimate of incumbency advantage. We knew lots of information about individual candidates but we didn’t include any of it in our model, all we included was party, incumbency status, and vote share in previous election. Why? Because including more info would require more modeling and data effort, and we were happy with the precision of the estimates that we had. Years later, Zaiying Huang and I returned to the problem and fit a better model and got more informative estimates. But we still left lots of information on the table. Why? Again, because including more info would require more modeling and data effort, and we were happy with the precision of the estimates that we had.

“A prior” sounds too much like “a prejudice” to catch on these days. You need a term that doesn’t begin with “pr.”

No wonder probability never really captured the public imagination.

+1

Actually, there are other consonant sequences that have negative connotations. Hr is one. Any word starting with hr seems to have a negative connotation. I have a data point to support this; in 1991, I saw a Chinese tiger balm bottle in Osaka that said that it cures Hrngh, which I assume stands for anything that makes you feel bad.

Only one is based on facts.

It can be so, but please remember that you are defining a non-informative prior in a wrong way. It would be better to call them vague prior, as their variance can have an impact on the inferences you are making: the choice of the prior – although vague – can produce strikingly different results esp. in small studies. Please see Lambert et al.2005 http://www.ncbi.nlm.nih.gov/pubmed/16015676

Kind Regards,

Gian Luca Di Tanna

There are many examples of when it is not only convenient to throw away information that one possesses, but it is the proper thing to do and this heavily depends on the context of the problem one is working on.

1. When performing causal inference in observational studies one might form treatment and control groups that are similar to each other by matching on the propensity score and then estimate the causal effect within each of these subgroups. However, it may be that some resultant subgroups do not yield meaningful estimates because there is too large of an imbalance between treatment and control units, or too few units in aggregate. In such cases it would probably be best to toss these inadequate groups aside.

2. Again in causal inference, there may be a variable that can inform the analyst whether a unit should or should not be discarded in the analysis. For instance in determining the causal effect of a birth canal antiseptic on the mortality of newborns, it would not make sense to include units that are delivered through a cesarean section since presumably whether such newborns live or die is not dependent on the antiseptic. Including such units in an estimate of the causal effect of antiseptic on newborn mortality would only bias the estimate of the causal effect. (This is an example I learned from Don Rubin in a course on design of experiments co-taught with Tirthankar Dasgupta.)

3. In a classification problem, covariate “bias” in the training data could bias the classifier and lead to serious overfitting (regardless of whether one uses sharply peaked priors at 0 for regression coefficients or regularizes estimates with something like LASSO). In such a case it is conceivable that resampling a subset of the original data can lead to a training set that is not “biased” (where I am using bias as in the imprecise sense above), so again it may make sense to relinquish data, not only for the sake of convenience, but also for the sake of analysis.

Additionally I agree (almost tautologically so) that if one has prior “information”, one should use such prior “information” in a Bayesian analysis. On the other hand, how one encodes prior “information” into an “informative” prior distribution meaningfully seems to be the core challenge, and furthermore this cannot be done without taking the “information” provided by the likelihood (which includes observed data) into account. How one inputs prior “information” into a Bayesian analysis must take into account the data actually observed, which may run counter to a traditional Bayesian viewpoint; namely that a prior must be defined before looking at the data.

Was not expecting the link for the quote to be a touching tribute to Kathryn Chaloner! I did not know she had passed away last month, sad news.