Ben, Jonah, Imad, and I write:

The usual definition of R-squared (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the variance of the predicted values divided by the variance of predicted values plus the variance of the errors. This summary is computed automatically for linear and generalized linear regression models fit using rstanarm, our R package for fitting Bayesian applied regression models with Stan. . . .

When I think of R squared, I generally think of 1 – Variance(residuals) / Variance(y), which is less than 1 always; this is also how Wikipedia describes it. This works for Bayesian models in that the R squared is always less than 1, though it can be less than 0.

Ok, not a regression or stats guru by any means (which should be obvious), but I’m confused too. var(predicted)/var(actual) doesn’t seem intuitively interesting to me, except when – after fragile assumptions about LS – you can argue it’s the same as (1-) a normalized measure of residual variance. Otherwise what’s the argument that high r^2 is a good thing?

Why isn’t 1 – residual-variance/pre-model-variance the more sensible target of improvements?

Bxg:

For reasons discussed in our paper, we like the definition of R-squared as the variance of the predicted values divided by the variance of predicted values plus the variance of the errors, so that the variance decomposition is baked into the definition. To me, this captures the most important aspect of R-squared which is based on a decomposition of variance. You can also look at my 2006 paper with Pardoe.

I feel (and am) honored to have a direct response from you, but I’m too silly to understand it. I’ve (tried to) read this paper now twice, and just don’t see the “we like the definition …” part – is it in Section 4, or elsewhere? How much would your paper change if (1 – ) residual-variance/pre-model-variance was the default? Sorry.

I’ve always felt a little bit squeamish about calling the modelled conditional expected values predictions. I mean… they are exactly what they are, expected values of whatever distribution is used as a model. It makes even less sense if the distribution is bernoulli distribution: the “predicted” values are impossible to observe. Well, excluding the lower and upper asymptotes, but but butt.

Krolbo:

Technically these are not predictions, they’re expected values of the predictive distributions. But “expected value” is also confusing. We need some better term for that!