Our (Aki, Andrew and Jonah) paper Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC was recently published in Statistics and Computing. In the paper we show
- why it’s better to use LOO instead of WAIC for model evaluation
- how to compute LOO quickly and reliably using the full posterior sample
- how Pareto smoothing importance sampling (PSIS) reduces variance of LOO estimate
- how Pareto shape diagnostics can be used to indicate when PSIS-LOO fails
PSIS-LOO makes it possible to use automated LOO in practice in rstanarm, which provides a flexible way to use pre-compiled Stan regression models. The estimation using sampling obtains draws from the full posterior and these same draws are used to compute PSIS-LOO estimate with a negligible additional computational cost. PSIS-LOO can fail, but possible failure is reliably detected by Pareto shape diagnostics. If there are high estimated Pareto shape values, the summary of these is reported to a user with suggestions what to do next. In the initial modeling phase the user can ignore the warnings (and get anyway more reliable results than WAIC or DIC). If there are high estimated Pareto shape values, rstanarm offers to rerun the inference only for the problematic leave-one-out folds (in the paper we named this approach PSIS-LOO+). If there are many high values, rstanarm offers to run k-fold-CV. This way the fast predictive performance estimate is always provided and user can decide how much additional computation time is used to get more accurate results. In the future we will add other utility and cost functions such as explained variance, MAE and classification accuracy to provide easier interpretation of the predictive performance.
The above approach can be used also when using Stan via other interfaces than rstanarm, although then the user needs to add a few lines to the usual Stan code. After this PSIS-LOO and diagonstics are easily computed using the available packages for R, Python, and Matlab.