Tim Hanson sends along this paper (coauthored with Adam Branscum and Wesley Johnson):

Eliciting information from experts for use in constructing prior distributions for logistic regression coefficients can be challenging. The task is especially difficult when the model contains many predictor variables, because the expert is asked to provide summary information about the probability of “success” for many subgroups of the population. Often, however, experts are confident only in their assessment of the population as a whole. This paper is about incorporating such overall, marginal or averaged, information easily into a logistic regression data analysis by using g-priors. We present a version of the g-prior such that the prior distribution on the probability of success can be set to closely match a beta dis- tribution, when averaged over the set of predictors in a logistic regression. A simple data augmentation formulation that can be implemented in standard statistical software packages shows how population-averaged prior information can be used in non-Bayesian contexts.

The g-prior is a class of models defined on transformed coefficients. Strictly speaking, the g-prior is improper because it depends on the data (in a regression model, the g-prior is scaled based on the observed matrix of predictors). But it’s an interesting case because, although it’s improper, it can be informative and have a finite integral.

Hanson writes:

Our “noninformative” version of the “informative” g-prior did not do as well as the Gelman et al. prior, but the informative version did quite well. A little real prior information goes a long way!

I agree with this message. Prior information can make a big difference. This is a point that we downplayed in BDA, but over the past few years I’ve moved toward respecting prior information.

> Prior information can make a big difference. This is a point that we downplayed in BDA

Downplayed in most writing?

Unfortunate given plotting of the prior (as transformed coefficients) can usually pick up whether the difference is helpful or harmful.

My guess is that all the energy is spent sampling from the posterior and to also sample from the prior which requires different programs seems like too much hardship. (My guess is Stan cannot provide samples from the prior. WinBUGS can but only if the priors are concentrated enough)

I am not a statistician but I thought that one of the key things about Baysian Statisitcs was the use of prior knowledge. Isn’t one supposed to use an informed prior (sorry for the bad terminology) ?

What am I misunderstanding here?

It comes from the realization that with a joint distribution for parameters (unknowns) and data (knowns) – i.e. both prior and data model – there is fantastic _machinery_ to do almost any analysis you can think of _easily_. Formally for that joint model, the inference is what it should be – above reproach given model (both data model and prior) is not too wrong.

So even when you are very unsure about the unknowns (prior) there is a temptation to try and specify a placeholder prior that does not matter but allows you to run the machinery and still do well. Unfortunately both the still do well and what those priors should be has never been worked out to any satisfactory degree e.g. http://normaldeviate.wordpress.com/2013/07/13/lost-causes-in-statistics-ii-noninformative-priors/

but (many suspect) a large number of Bayesian analyses are naively done that way (using default priors of the software). I like Peter McCullagh’s characterization of such a prior as needing to actually be a countable infinite set of priors which suggests a countable infinite set of posteriors to interpret.

If the regressors are not random, does this still count as improper? Or maybe I’m missing something (I have only skimmed & searched the paper).

Dean:

If the prior depends on the data, I would call it improper. I would consider a prior that depends on sample size to be improper also. Improper can be ok, but it means that you’re giving up the interpretation of the joint distribution of parameter and data.

My point is if the regressors are fixed, then their covariance matrix is also fixed, so there isn’t anything to “depend on” statistically. I’m thinking of, say, an industrial experiment where all the Xs are assigned. More generally, the usual analysis of linear regression ends up treating the Xs as fixed, not random.

Dean:

If I would use prior P1 if the design points took on the value X1, or prior P2 if the design points took on the value X2, then the prior depends on X, even if X is assigned deterministically by the experimenter.

It’s common practice to choose a prior after seeing the data structure (for example, using a normal prior if the data model is normal, or a beta prior if the data model is binomial). Strictly speaking, these are improper priors too, in that the prior depends on the design of the experiment used to gather the data. It’s not so horrible to do this, as there is an element of arbitrariness to any model. In addition, if we know something about the design (for example, the sample size and values of the predictors X), this can tell us that the data will be highly informative in some dimensions and it might not be worth the trouble to set up a careful prior for those aspects of the problem.