R-squared for Bayesian regression models

Posted on December 21, 2017 9:03 AM by Andrew

Ben, Jonah, Imad, and I write:

The usual definition of R-squared (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the variance of the predicted values divided by the variance of predicted values plus the variance of the errors. This summary is computed automatically for linear and generalized linear regression models fit using rstanarm, our R package for fitting Bayesian applied regression models with Stan. . . .

The full paper is here.

EDIT by Aki: Link was updated to the published version.

6 thoughts on “R-squared for Bayesian regression models”

Bryan on December 21, 2017 5:04 PM at 5:04 pm said:

When I think of R squared, I generally think of 1 – Variance(residuals) / Variance(y), which is less than 1 always; this is also how Wikipedia describes it. This works for Bayesian models in that the R squared is always less than 1, though it can be less than 0.

Reply ↓
bxg on December 21, 2017 9:38 PM at 9:38 pm said:

Ok, not a regression or stats guru by any means (which should be obvious), but I’m confused too. var(predicted)/var(actual) doesn’t seem intuitively interesting to me, except when – after fragile assumptions about LS – you can argue it’s the same as (1-) a normalized measure of residual variance. Otherwise what’s the argument that high r^2 is a good thing?

Why isn’t 1 – residual-variance/pre-model-variance the more sensible target of improvements?

Reply ↓
- Andrew on December 21, 2017 10:43 PM at 10:43 pm said:
  
  Bxg:
  
  For reasons discussed in our paper, we like the definition of R-squared as the variance of the predicted values divided by the variance of predicted values plus the variance of the errors, so that the variance decomposition is baked into the definition. To me, this captures the most important aspect of R-squared which is based on a decomposition of variance. You can also look at my 2006 paper with Pardoe.
  
  Reply ↓
  - bxg on December 21, 2017 11:17 PM at 11:17 pm said:
    
    I feel (and am) honored to have a direct response from you, but I’m too silly to understand it. I’ve (tried to) read this paper now twice, and just don’t see the “we like the definition …” part – is it in Section 4, or elsewhere? How much would your paper change if (1 – ) residual-variance/pre-model-variance was the default? Sorry.
    
    Reply ↓
Krolbo on December 23, 2017 9:12 AM at 9:12 am said:

I’ve always felt a little bit squeamish about calling the modelled conditional expected values predictions. I mean… they are exactly what they are, expected values of whatever distribution is used as a model. It makes even less sense if the distribution is bernoulli distribution: the “predicted” values are impossible to observe. Well, excluding the lower and upper asymptotes, but but butt.

Reply ↓
- Andrew on December 23, 2017 9:52 AM at 9:52 am said:
  
  Krolbo:
  
  Technically these are not predictions, they’re expected values of the predictive distributions. But “expected value” is also confusing. We need some better term for that!
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

R-squared for Bayesian regression models

6 thoughts on “R-squared for Bayesian regression models”

Leave a Reply Cancel reply