This (by Aleks, Grazia, Yu-Sung, and myself) is really cool. Here’s the abstract:

We propose a new prior distribution for classical (non-hierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. We implement a procedure to fit generalized linear models in R with this prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several examples, including a series of logistic regressions predicting voting preferences, an imputation model for a public health data set, and a hierarchical logistic regression in epidemiology.

We recommend this default prior distribution for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small) and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation.

It solves the separation problem and now I use it in my routine applied work. It’s implemented as bayesglm() in the “arm” package in R.

Here’s a pretty picture from the paper showing the performance of different Student-t prior distributions on cross-validation with a corpus of datasets:

The Cauchy with scale 0.8 does the best, but we go with the Cauchy with scale 2.5 because it is more “conservative,” as statisticians would say. (See here for more discussion of conservatism in statistics.)

Andrew,

"We recommend this default prior distribution for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression"

Correct me if I'm wrong but, if memory serves, in the case of complete separation the MLE does not exist. Yet I can name several packages that will give an answer in the case of complete separation; of course all of these answers are wrong. Hence, I would view as a vice rather than a virtue the production of a "solution" when in fact the solution does not exist.

Regards,

Bruce

Bruce,

The answer given by bayesglm is not "wrong" except in the sense that all estimates are wrong because they will not be equal to the true beta, which is unknown and can only be estimated. Maximum likelihood is one particular estimation procedure which has good properties in some settings but does not perform so well for logistic regression, compared to various regularized alternatives.

To put it another way, the goal is not to maximize the likelihood; the goal is to estimate the regression coefficients given noisy data.