loo 2.0 is loose

Posted on April 16, 2018 7:51 PM by Jonah Gabry

This post is by Jonah and Aki.

We’re happy to announce the release of v2.0.0 of the loo R package for efficient approximate leave-one-out cross-validation (and more). For anyone unfamiliar with the package, the original motivation for its development is in our paper:

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. (published version, arXiv preprint)

Version 2.0.0 is a major update (release notes) to the package that we’ve been working on for quite some time and in this post we’ll highlight some of the most important improvements. Soon I (Jonah) will follow up with a post about important new developments in our various other R packages.

New interface, vignettes, and more helper functions to make the package easier to use

Because of certain improvements to the algorithms and diagnostics (summarized below), the interfaces, i.e., the loo() and psis() functions and the objects they return, also needed some improvement. (Click on the function names in the previous sentence to see their new documentation pages.) Other related packages in the Stan R ecosystem (e.g., rstanarm, brms, bayesplot, projpred) have also been updated to integrate seamlessly with loo v2.0.0. (Apologies to anyone who happened to install the update during the short window between the loo release and when the compatible rstanarm/brms binaries became available on CRAN.)

Three vignettes now come with the loo package package and are also available (and more nicely formatted) online at mc-stan.org/loo/articles:

Using the loo package (version >= 2.0.0) (view)
Bayesian Stacking and Pseudo-BMA weights using the loo package (view)
Writing Stan programs for use with the loo package (view)

A vignette about K-fold cross-validation using new K-fold helper functions will be included in a subsequent update. Since the last release of loo we have also written a paper, Visualization in Bayesian workflow, that includes several visualizations based on computations from loo.

Improvements to the PSIS algorithm, effective sample sizes and MC errors

The approximate leave-one-out cross-validation performed by the loo package depends on Pareto smoothed importance sampling (PSIS). In loo v2.0.0, the PSIS algorithm (psis() function) corresponds to the algorithm in the most recent update to our PSIS paper, including adapting the Pareto fit with respect to the effective sample size and using a weakly informative prior to reduce the variance for small effective sample sizes. (I believe we’ll be updating the paper again with some proofs from new coauthors.)

For users of the loo package for PSIS-LOO cross-validation and not just the PSIS algorithm for importance sampling, an even more important update is that the latest version of the same PSIS paper referenced above describes how to compute the effective sample size estimate and Monte Carlo error for the PSIS estimate of elpd_loo (expected log predictive density for new data). Thus, in addition to the Pareto k diagnostic (an indicator of convergence rate – see paper) already available in previous loo versions, we now also report an effective sample size that takes into account both the MCMC efficiency and the importance sampling efficiency. Here’s an example of what the diagnostic output table from loo v2.0.0 looks like (the particular intervals chosen for binning are explained in the papers and also the package documentation) for the diagnostics:

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     240   91.6%   205
 (0.5, 0.7]   (ok)         7    2.7%   48
   (0.7, 1]   (bad)        8    3.1%   7
   (1, Inf)   (very bad)   7    2.7%   1

We also compute and report the Monte Carlo SE of elpd_loo to give an estimate of the accuracy. If some k>1 (which means the PSIS-LOO approximation is not reliable, as in the example above) NA will be reported for the Monte Carlo SE. We hope that showing the relationship between the k diagnostic, effective sample size, and and MCSE of elpd_loo will make it easier to interpret the diagnostics than in previous versions of loo that only reported the k diagnostic. This particular example is taken from one of the new vignettes, which uses it as part of a comparison of unstable and stable PSIS-LOO behavior.

Weights for model averaging: Bayesian stacking, pseudo-BMA and pseudo-BMA+

Another major addition is the loo_model_weights() function, which, thanks to the contributions of Yuling Yao, can be used to compute weights for model averaging or selection. loo_model_weights() provides a user friendly interface to the new stacking_weights() and pseudobma_weights(), which are implementations of the methods from Using stacking to average Bayesian predictive distributions (Yao et al., 2018). As shown in the paper, Bayesian stacking (the default for loo_model_weights()) provides better model averaging performance than “Akaike style“ weights, however, the loo package does also include Pseudo-BMA weights (PSIS-LOO based “Akaike style“ weights) and Pseudo-BMA+ weights, which are similar to Pseudo-BMA weights but use a so-called Bayesian bootstrap procedure to better account for the uncertainties. We recommend the Pseudo-BMA+ method instead of, for example, WAIC weights, although we prefer the stacking method to both. In addition to the Yao et al. paper, the new vignette about computing model weights demonstrates some of the motivation for our preference for stacking when appropriate.

Give it a try

You can install loo v2.0.0 from CRAN with install.packages("loo"). Additionally, reinstalling an interface that provides loo functionality (e.g., rstanarm, brms) will automatically update your loo installation. The loo website with online documentation is mc-stan.org/loo and you can report a bug or request a feature on GitHub.

11 thoughts on “loo 2.0 is loose”

Andrew on April 16, 2018 8:11 PM at 8:11 pm said:

Awesome!

Reply ↓
Pointeroutguy on April 16, 2018 8:40 PM at 8:40 pm said:

Thought this was a delayed April fools joke, but looks like a useful package.

Reply ↓
- Nick on April 19, 2018 11:44 AM at 11:44 am said:
  
  It will be documented in the forthcoming “loo paper”, to the amusement of British readers.
  
  Reply ↓
  - Jonah on April 19, 2018 8:38 PM at 8:38 pm said:
    
    When I do a google image search for “loo package” the results are a mix of figures from our papers and toilets:
    
    https://www.amazon.co.uk/Wellness-Hung-Loo-Package-Pack-Free-Wall-Hanging-Badkeramik/dp/B00VVNU3X4
    
    Reply ↓
Kristian Brock on April 17, 2018 1:53 AM at 1:53 am said:

This sounds fantastic, can’t wait to work through the vignettes. This whole suite of R packages built upon Stan is really having a big impact on the way I work, so thanks!

I saw a post from Aki a while ago on projection predictive model selection (with Piironen, I recall). Is that in this release, or is it planned to come to loo?

Reply ↓
- Jonah on April 17, 2018 2:15 AM at 2:15 am said:
  
  Glad to hear that!
  
  And yes, there’s the projpred package, which is being developed at https://github.com/stan-dev/projpred and uses loo for some backend computations. I forgot to mention it, but projpred was also recently been updated with compatibility with this new release. I will update the post to mention that.
  
  Reply ↓
  - Aki Vehtari on April 17, 2018 10:27 AM at 10:27 am said:
    
    projpred is also in CRAN https://cran.r-project.org/package=projpred and there are several demos in https://github.com/avehtari/modelselection_tutorial Some of these demos will eventually be transformed to proper vignettes and case-studies.
    
    Reply ↓
    - Jonah on April 17, 2018 7:15 PM at 7:15 pm said:
      
      If I haven’t done it before then, when you’re here in NYC soon remind me to setup a web page for projpred. It should have one like we have for the other R packages.
Noah Motion on April 17, 2018 11:08 AM at 11:08 am said:

Just out of curiosity, will your (Aki’s) python PSIS-LOO code be updated with the new algorithm(s) and/or diagnostics?

Reply ↓
- Aki Vehtari on April 17, 2018 2:41 PM at 2:41 pm said:
  
  Python PSIS-LOO code https://github.com/avehtari/PSIS/blob/master/py/psis.py was updated with the new algorithm 6 months ago. It was much easier to update just a simple stand-alone function, and only PSIS algorithm part was updated, ie it’s not computing effective sample size and MCSE. loo 2 did long time to release, as the major changes required coordination with other packages using it.
  
  Reply ↓
  - Noah Motion on April 17, 2018 6:21 PM at 6:21 pm said:
    
    Excellent! Thanks! I think I’ve already been using the updated version, then.
    
    Reply ↓

Statistical Modeling, Causal Inference, and Social Science

loo 2.0 is loose

11 thoughts on “loo 2.0 is loose”

Leave a Reply Cancel reply