Stan class at NYR Conference in July (in person and virtual)

I (Jonah) am excited to be teaching a 2-day Stan workshop preceding the NYR Conference in July. The workshop will be July 11-12 and the conference July 13-14.  The focus of the workshop will be to introduce the basics of applied Bayesian data analysis, the Stan modeling language, and how to interface with Stan from R. Over the course of two full days, participants will learn to write models in the Stan language, run them in R, and use a variety of R packages to work with the results.

There are both in-person and remote spots available, so if you can’t make it to NYC you can still participate. For tickets to the workshop and/or conference head to https://rstats.ai/nyr.

P.S. In my original post I forgot to mention that you can use the discount code STAN20 to get 20% off tickets for the workshop and conference!

 

Job opportunity for a Stan user at Dugway

Scott Hunter writes:

Job opportunity as a statistician at US Army Dugway Proving Ground, in remote Dugway, Utah. Work is rewarding in that we solve problems to improve protecting our warfighters from chemical and biological attacks. Looking for someone comfortable coding in R and Stan (primarily using the rstan and rstanarm packages). Most problems deal with linear models as well as logistic models which may or may not be hierarchical. Design of Experiment experience is a plus as well as writing Shiny applications. Pay will depend on education/experience and ranges from $66,214 to $122,684 (GS11 – GS13 pay scale). Work week is Monday through Thursday with telework opportunities. If there is any interest, please send a resume soonest to [email protected].

I’ve done some Stan trainings for this group in the past and I really enjoyed working with them!

The Tampa Bay Rays baseball team is looking to hire a Stan user

Andrew and I have blogged before about job opportunities in baseball for Stan users (e.g., here and here) and here’s a new one. This time it’s the Tampa Bay Rays who are hiring. The job title is “Analyst, Baseball Research & Development” and here are the responsibilities and qualifications:

Responsibilities:
* Build customized statistical modeling tools for accurate prediction and inference for various baseball applications.
* Provide statistical modeling expertise to other R&D Analysts.
* Optimize code to ensure quick and reliable model sampling/optimization.
* Author both technical and non-technical internal reports on your work.

Qualifications:
* Experience with Stan or other probabilistic programming language
* Experience with R or Python
* Deep understanding of the fundamentals of Bayesian Inference, MCMC, and Autocorrelation/Time Series Modeling.
* Start date is flexible. For example, candidates with an extensive amount of remaining time left in an academic program are encouraged to apply immediately.
* Candidates with non-traditional schooling backgrounds, as well as candidates with Advanced degree (Masters or PhD) in Statistics, Data Science, Machine Learning, or a related field are encouraged to apply

That’s just part of the job ad, so I recommend checking out the full posting, which includes important details like the fact that remote work is a possibility.

Here are a few other details I can share that aren’t included in the job ad:

  • The Rays have already been using Stan for years now so you won’t be the only Stan user there.
  • A few years ago a few of us (Stan developers) did some consulting/training work for the Rays and had a great experience. Some of their R&D team members have changed since then but I still know some of the ones there and I highly recommend working with them if you’re interested in baseball.
  • The Rays always have one of the lowest payrolls for their roster and yet they are somehow consistently competitive (they even made the World Series last year!). I’m sure there are multiple reasons for this, but I strongly suspect that the strength of the R&D team you’d be joining is one of them.

 

StanConnect 2021: Call for Session Proposals

Back in February it was decided that this year’s StanCon would be a series of virtual mini-symposia with different organizers instead of a single all-day event. Today the Stan Governing Body (SGB) announced that submissions are now open for anyone to propose organizing a session. Here’s the announcement from the SGB on the Stan forums: 

Following up on our previous announcement, the SGB is excited to announce a formal call for proposals for StanConnect 2021.

StanConnect is a virtual miniseries that will consist of several 3-hour meetings/mini-symposia. You can think of each meeting as a kind of organized conference “session.”

  • Anyone can feel free to organize a StanConnect meeting as a “Session Chair”. Simply download the proposal form as a docx, fill it out, and submit to SGB via email ([email protected]) by April 26, 2021 (New York) . The meeting must be scheduled for sometime this year after June 1.
  • The talks must involve Stan and be focused around a subject/topic theme. E.g. “Spatial models in Ecology via Stan”.
  • You will see that though we provide a few “templates” for how to structure a StanConnect meeting, we are trying to avoid being overly prescriptive. Rather, we are giving Session Chairs freedom to invite speakers related to their theme and structure the 3-hr meeting as they see fit.
  • If you have any questions, please feel free to post here.

I wasn’t involved in the decision to change the format but I really like the idea of a virtual miniseries. I thought the full day StanCon 2020 was great, but one nearly 24-hour global virtual conference feels like enough. And hopefully having a bunch of separately organized events will give more people a chance to get involved with Stan, either as an organizer, speaker, or attendee. 

The current state of the Stan ecosystem in R

(This post is by Jonah)

Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of Stan users who are releasing their own packages interfacing with rstan or one of our other packages.

Interfaces

rstanarm and brms: Version 2.17.4 of rstanarm and version 2.2.0 of brms were both released to provide compatibility with the new features in loo v2.0.0. Two of the new vignettes for the loo package show how to use it with rstanarm models, and we have also just released a draft of a vignette on how to use loo with brms and rstan for many “non-factorizable” models (i.e., observations not conditionally independent). brms is also now officially supported by the Stan Development Team (welcome Paul!) and there is a new category for it on the Stan Forums.

rstan: The next release of the rstan package (v2.18), is not out yet (we need to get Stan 2.18 out first), but it will include a loo() method for stanfit objects in order to save users a bit of work. Unfortunately, we can’t save you the trouble of having to compute the point-wise log-likelihood in your Stan program though! There will also be some new functions that make it a bit easier to extract HMC/NUTS diagnostics (thanks to a contribution from Martin Modrák).

Visualization

bayesplot: A few weeks ago we released version 1.5.0 of the bayesplot package (mc-stan.org/bayesplot), which also integrates nicely with loo 2.0.0. In particular, the diagnostic plots using the leave-one-out cross-validated probability integral transform (LOO-PIT) from our paper Visualization in Bayesian Workflow (preprint on arXiv, code on GitHub) are easier to make with the latest bayesplot release. Also, TJ Mahr continues to improve the bayesplot experience for ggplot2 users by adding (among other things) more functions that return the data used for plotting in a tidy data frame.

shinystan: Unfortunately, there hasn’t been a shinystan (mc-stan.org/shinystan) release in a while because I’ve been busy with all of these other packages, papers, and various other Stan-related things. We’ll try to get out a release with a few bug fixes soon. (If you’re annoyed by the lack of new features in shinystan recently let me know and I will try to convince you to help me solve that problem!)

(Update: I forgot to mention that despite the lack of shinystan releases, we’ve been working on better introductory materials. To that end, Chelsea Muth, Zita Oravecz, and I recently published an article User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan (view).)

Other tools

loo: We released version 2.0.0, a major update to the loo package (mc-stan.org/loo). See my previous blog post.

projpred: Version 0.8.0 of the projpred package (mc-stan.org/projpred) for projection predictive variable selection for GLMs was also released shortly after the loo update in order to take advantage of the improvements to the Pareto smoothed importance sampling algorithm. projpred can already be used quite easily with rstanarm models and we are working on improving its compatibility with other packages for fitting Stan models.

rstantools: Unrelated to the loo update, we also released version 1.5.0 of the rstantools package (mc-stan.org/rstantools), which provides functions for setting up R packages interfacing with Stan. The major changes in this release are that usethis::create_package() is now called to set up the package (instead of utils::package.skeleton), fewer manual changes to files are required by users after calling rstan_package_skeleton(), and we have a new vignette walking through the process of setting up a package (thanks Stefan Siegert!). Work is being done to keep improving this process, so be on the lookout for more updates soonish.

Stan related R packages from other developers

There are now well over fifty packages on CRAN that depend in some way on one of our R packages mentioned above!  You can find most of them by looking at the “Reverse dependencies” section on the CRAN page for rstan, but that doesn’t count the ones that depend on bayesplot, shinystanloo, etc., but not rstan.

Unfortunately, given the growing number of these packages, we haven’t been able to look at each one of them in detail. For obvious reasons we prioritize giving feedback to developers who reach out to us directly to ask for comments and to those developers who make an effort to our recommendations for developers of R packages interfacing with Stan (included with the rstantools package since its initial release in 2016). If you are developing one of these packages and would like feedback please let us know on the Stan Forums. Our time is limited but we really do make a serious effort to answer every single question asked on the forums (thank you to the many Stan users who also volunteer their time helping on the forums!).

My primary feelings about this trend of developing Stan-based R packages are ones of excitement and gratification. It’s really such an honor to have so many people developing these packages based on all the work we’ve done! There are also a few things I’ve noticed that I hope will change going forward. I’ll wrap up this post by highlighting two of these issues that I hope developers will take seriously:

(1) Unit testing

(2) Naming user-facing functions

The number of these packages that have no unit tests (or very scant testing) is a bit scary. Unit tests won’t catch every possible bug (we have lots of tests for our packages and people still find bugs all the time), but there is really no excuse for not unit testing a package that you want other people to use. If you care enough to do everything required to create your package and get it on CRAN, and if you care about your users, then I think it’s fair to say that you should care enough to write tests for your package. And there’s really no excuse these days with the availability of packages like testthat to make this process easier than it used to be! Can anyone think of a reasonable excuse for not unit testing a package before releasing it to CRAN and expecting people to use it? (Not a rhetorical question. I really am curious given that it seems to be relatively common or at least not uncommon.) I don’t mean to be too negative here. There are also many packages that seem to have strong testing in place! My motivation for bringing up this issue is that it is in the best interest of our users.

Regarding function naming: this isn’t nearly as big of a deal as unit testing, it’s just something I think developers (including myself) of packages in the Stan R ecosystem can do to make the experience better for our users. rstanarm and brms both import the generic functions included with rstantools in order to be able to define methods with consistent names. For example, whether you fit a model with rstanarm or with brms, you can call log_lik() on the fitted model object to get the pointwise log-likelihood (it’s true that we still have a bit left to do to get the names across rstanarm and brms more standardized, but we’re actively working on it). If you are developing a package that fits models using Stan, we hope you will join us in trying to make it as easy as possible for users to navigate the Stan ecosystem in R.

loo 2.0 is loose

This post is by Jonah and Aki.

We’re happy to announce the release of v2.0.0 of the loo R package for efficient approximate leave-one-out cross-validation (and more). For anyone unfamiliar with the package, the original motivation for its development is in our paper:

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. (published versionarXiv preprint)

Version 2.0.0 is a major update (release notes) to the package that we’ve been working on for quite some time and in this post we’ll highlight some of the most important improvements. Soon I (Jonah) will follow up with a post about important new developments in our various other R packages.

New interface, vignettes, and more helper functions to make the package easier to use

Because of certain improvements to the algorithms and diagnostics (summarized below), the interfaces, i.e., the loo() and psis() functions and the objects they return, also needed some improvement. (Click on the function names in the previous sentence to see their new documentation pages.) Other related packages in the Stan R ecosystem (e.g., rstanarm, brms, bayesplot, projpred) have also been updated to integrate seamlessly with loo v2.0.0. (Apologies to anyone who happened to install the update during the short window between the loo release and when the compatible rstanarm/brms binaries became available on CRAN.)

Three vignettes now come with the loo package package and are also available (and more nicely formatted) online at mc-stan.org/loo/articles:

  • Using the loo package (version >= 2.0.0) (view)
  • Bayesian Stacking and Pseudo-BMA weights using the loo package (view)
  • Writing Stan programs for use with the loo package (view)

A vignette about K-fold cross-validation using new K-fold helper functions will be included in a subsequent update. Since the last release of loo we have also written a paper, Visualization in Bayesian workflow, that includes several visualizations based on computations from loo.

Improvements to the PSIS algorithm, effective sample sizes and MC errors

The approximate leave-one-out cross-validation performed by the loo package depends on Pareto smoothed importance sampling (PSIS). In loo v2.0.0, the PSIS algorithm (psis() function) corresponds to the algorithm in the most recent update to our PSIS paper, including adapting the Pareto fit with respect to the effective sample size and using a weakly informative prior to reduce the variance for small effective sample sizes. (I believe we’ll be updating the paper again with some proofs from new coauthors.)

For users of the loo package for PSIS-LOO cross-validation and not just the PSIS algorithm for importance sampling, an even more important update is that the latest version of the same PSIS paper referenced above describes how to compute the effective sample size estimate and Monte Carlo error for the PSIS estimate of elpd_loo (expected log predictive density for new data). Thus, in addition to the Pareto k diagnostic (an indicator of convergence rate – see paper) already available in previous loo versions, we now also report an effective sample size that takes into account both the MCMC efficiency and the importance sampling efficiency. Here’s an example of what the diagnostic output table from loo v2.0.0 looks like (the particular intervals chosen for binning are explained in the papers and also the package documentation) for the diagnostics:

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     240   91.6%   205
 (0.5, 0.7]   (ok)         7    2.7%   48
   (0.7, 1]   (bad)        8    3.1%   7
   (1, Inf)   (very bad)   7    2.7%   1

We also compute and report the Monte Carlo SE of elpd_loo to give an estimate of the accuracy. If some k>1 (which means the PSIS-LOO approximation is not reliable, as in the example above) NA will be reported for the Monte Carlo SE. We hope that showing the relationship between the k diagnostic, effective sample size, and and MCSE of elpd_loo will make it easier to interpret the diagnostics than in previous versions of loo that only reported the k diagnostic. This particular example is taken from one of the new vignettes, which uses it as part of a comparison of unstable and stable PSIS-LOO behavior.

Weights for model averaging: Bayesian stacking, pseudo-BMA and pseudo-BMA+

Another major addition is the loo_model_weights() function, which, thanks to the contributions of Yuling Yao, can be used to compute weights for model averaging or selection. loo_model_weights() provides a user friendly interface to the new stacking_weights() and pseudobma_weights(), which are implementations of the methods from Using stacking to average Bayesian predictive distributions (Yao et al., 2018). As shown in the paper, Bayesian stacking (the default for loo_model_weights()) provides better model averaging performance than “Akaike style“ weights, however, the loo package does also include Pseudo-BMA weights (PSIS-LOO based “Akaike style“ weights) and Pseudo-BMA+ weights, which are similar to Pseudo-BMA weights but use a so-called Bayesian bootstrap procedure to  better account for the uncertainties. We recommend the Pseudo-BMA+ method instead of, for example, WAIC weights, although we prefer the stacking method to both. In addition to the Yao et al. paper, the new vignette about computing model weights demonstrates some of the motivation for our preference for stacking when appropriate.

Give it a try

You can install loo v2.0.0 from CRAN with install.packages("loo"). Additionally, reinstalling an interface that provides loo functionality (e.g., rstanarm, brms) will automatically update your loo installation. The loo website with online documentation is mc-stan.org/loo and you can report a bug or request a feature on GitHub.

StanCon Submissions Reminder

The deadline for submissions to StanCon 2018 is approaching fast! Submissions should be sent by September 16, 2017 5:00:00 AM GMT.

StanCon’s version of conference proceedings is a collection of contributed talks based on interactive, self-contained notebooks (e.g., knitr, R Markdown, Jupyter, etc.). For example, you might demonstrate a novel modeling technique, or (possibly simplified version of) a novel application, etc. There is no minimum or maximum length and anyone using Stan is welcome to submit a contributed talk.

More details are available on the StanCon submissions web page and examples of accepted submissions from last year are available in our stancon_talks repository on GitHub.

Stan in St. Louis this Friday

This Friday afternoon I (Jonah) will be speaking about Stan at Washington University in St. Louis. The talk is open to the public, so anyone in the St. Louis area who is interested in Stan is welcome to attend. Here are the details:

Title: Stan: A Software Ecosystem for Modern Bayesian Inference
Jonah Sol Gabry, Columbia University

Neuroimaging Informatics and Analysis Center (NIAC) Seminar Series
Friday April 28, 2017, 1:30-2:30pm
NIL Large Conference Room
#2311, 2nd Floor, East Imaging Bldg.
4525 Scott Avenue, St. Louis, MO

medicine.wustl.eduNIAC

R packages interfacing with Stan: brms

Over on the Stan users mailing list I (Jonah) recently posted about our new document providing guidelines for developing R packages interfacing with Stan. As I say in the post and guidelines, we (the Stan team) are excited to see the emergence of some very cool packages developed by our users. One of these packages is Paul Bürkner’s brms. Paul is currently working on his PhD in statistics at the University of Münster, having previously studied psychology and mathematics at the universities of Münster and Hagen (Germany). Here is Paul writing about brms:

The R package brms implements a wide variety of Bayesian regression models using extended lme4 formula syntax and Stan for the model fitting. It has been on CRAN for about one and a half years now and has grown to be probably one of the most flexible R packages when it comes to regression models.

A wide range of distributions are supported, allowing users to fit — among others — linear, robust linear, count data, response time, survival, ordinal, and zero-inflated models. You can incorporate multilevel structures, smooth terms, autocorrelation, as well as measurement error in predictor variables to mention only a few key features. Furthermore, non-linear predictor terms can be specified similar to how it is done in the nlme package and on top of that all parameters of the response distribution can be predicted at the same time.

After model fitting, you have many post-processing and plotting methods to choose from. For instance, you can investigate and compare model fit using leave-one-out cross-validation and posterior predictive checks or predict responses for new data.

If you are interested and want to learn more about brms, please use the following links:

  • GitHub repository (for source code, bug reports, feature requests)
  • CRAN website (for vignettes with guidance on how to use the package)
  • Wayne Folta’s blog posts (for interesting brms examples)

Also, a paper about brms will be published soon in the Journal of Statistical Software.

My thanks goes to the Stan Development Team for creating Stan, which is probably the most powerful and flexible tool for performing Bayesian inference, and for allowing me to introduce brms here at this blog.

StanCon: now accepting registrations and submissions

stancon2017_logo

As we announced here a few weeks ago, the first Stan conference will be Saturday, January 21, 2017 at Columbia University in New York. We are now accepting both conference registrations and submissions. Full details are available at StanCon page on the Stan website. If you have any questions please let us know and we hope to see you in NYC this January!

Here are the links for registration and submissions:

Registration

Anyone using or interested in Stan is welcome to register for the conference. To register for StanCon please visit the StanCon registration page.

Submissions

StanCon’s version of conference proceedings will be a collection of contributed talks based on interactive, self-contained notebooks (e.g., knitr, R Markdown, Jupyter, etc.). Submissions will be peer reviewed by the StanCon organizers and all accepted notebooks will be published in an official StanCon repository. If your submission is accepted we may also ask you to present during one of the StanCon sessions.

For details on submissions please visit the StanCon submissions page.


P.S. Stay tuned for an announcement about several Stan and Bayesian inference courses we will be offering in the days leading up to the conference.

ShinyStan v2.0.0

For those of you not familiar with ShinyStan, it is a graphical user interface for exploring Stan models (and more generally MCMC output from any software). For context, here’s the post on this blog first introducing ShinyStan (formerly shinyStan) from earlier this year.

shinystan_images

ShinyStan v2.0.0 released

ShinyStan v2.0.0 is now available on CRAN. This is a major update with a new look and a lot of new features. It also has a new(ish) name: ShinyStan is the app/GUI and shinystan the R package (both had formerly been shinyStan for some reason apparently not important enough for me to remember). Like earlier versions, this version has enhanced functionality for Stan models but is compatible with MCMC output from other software packages too.

You can install the new version from CRAN like any other package:

install.packages("shinystan")

If you prefer a version with a few minor typos fixed you can install from Github using the devtools package:

devtools::install_github("stan-dev/shinystan", build_vignettes = TRUE)

(Note: after installing the new version and checking that it works we recommend removing the old one by running remove.packages(“shinyStan”).)

If you install the package and want to try it out without having to first fit a model you can launch the app using the preloaded demo model:

library(shinystan)
launch_shinystan_demo()

Notes

This update contains a lot of changes, both in terms of new features added, greater UI stability, and an entirely new look. Some release notes can be found on GitHub and there are also some instructions for getting started on the ShinyStan wiki page. Here are two highlights:

  • The new interactive diagnostic plots for Hamiltonian Monte Carlo. In particular, these are designed for models fit with Stan using NUTS (the No-U-Turn Sampler).

    Diagnostics screenshot Diagnostics screenshotshinystan_diagnostics3

  • The deploy_shinystan function, which lets you easily deploy ShinyStan apps for your models to RStudio’s ShinyApps hosting service. Each of your apps (i.e. each of your models) will have a unique URL. To use this feature please also install the shinyapps package: devtools::install_github("rstudio/shinyapps").

The plan is to release a minor update with bug fixes and other minor tweaks in a month or so. So if you find anything we should fix or change (or if you have any other suggestions) we’d appreciate the feedback.

Introducing shinyStan

shiny_banner_big

As a project for Andrew’s Statistical Communication and Graphics graduate course at Columbia, a few of us (Michael Andreae, Yuanjun Gao, Dongying Song, and I) had the goal of giving RStan’s print and plot functions a makeover. We ended up getting a bit carried away and instead we designed a graphical user interface for interactively exploring virtually any Bayesian model fit using a Markov chain Monte Carlo algorithm.

The result is shinyStan, a package for R and an app powered by Shiny. The full version of shinyStan v1.0.0 can be downloaded as an R package from the Stan Development Team GitHub page here, and we have a demo up online here.  If you’re not an R user, we’re working on a full online version of shinyStan too.

shinystan_pics

 

For me, there are two primary motivations behind shinyStan:

1) Interactive visual model exploration
  • Immediate, informative, customizable visual and numerical summaries of model parameters and convergence diagnostics for MCMC simulations.
  • Good defaults with many opportunities for customization.
2) Convenient saving and sharing
  • Store the basic components of an entire project (code, posterior samples, graphs, tables, notes) in a single object.
  • Export graphics into R session as ggplot2 objects for further customization and easy integration in reports or post-processing for publication.

There’s also a third thing that has me excited at the moment. That online demo I mentioned above… well, since you’ll be able to upload your own data soon enough and even add your own plots if we haven’t included something you want, imagine an interactive library of your models hosted online. I’m imagining something like this except, you know, finite, useful, and for statistical models instead of books. (Quite possibly with fewer paradoxes too.) So it won’t be anything like Borges’ library, but I couldn’t resist the chance to give him a shout-out.

Finally, for those of you who haven’t converted to Stan quite yet, shinyStan is agnostic when it comes to inputs, which is to say that you don’t need to use Stan to use shinyStan (though we like it when you do). If you’re a Jags or Bugs user, or if you write your own MCMC algorithms, as long as you have some simulations in an array, matrix, mcmc.list, etc., you can take advantage of shinyStan.

Continue reading