Stan Weekly Roundup, 7 July 2017

Holiday weekend, schmoliday weekend.

  • Ben Goodrich and Jonah Gabry shipped RStan 2.16.2 (their numbering is a little beyond base Stan, which is at 2.16.0). This reintroduces error reporting that got lost in the 2.15 refactor, so please upgrade if you want to debug your Stan programs!
  • Joe Haupt translated the JAGS examples in the second edition of John Kruschke’s book Doing Bayesian Data Analysis into Stan. Kruschke blogged it and Haupt has a GitHub page with the Stan programs. I still owe him some comments on the code.
  • Andrew Gelman has been working on the second edition of his and Jennifer Hill’s regression book, which is being rewritten as two linked books and translated to Stan. He’s coordinating with Jonah Gabry and Ben Goodrich on the RStanArm replacements for lme4 and lm/glm in R.
  • Sean Talts got in the pull request for enabling C++11/C++14 in Stan. This is huge for us developers as we have a lot of pent-up demand for C++11 features on the back burner.
  • Michael Betancourt, with feedback from the NumFOCUS advisory board for Stan, put together a web page of guidelines for using the Stan trademarks.
  • Gianluca Baio released version 1.0.5 of survHE, a survival analysis package based on RStan (and INLA and ShinyStan). There’s also the GitHub repo that Jacki Buros Novik made available with a library of survival analysis models in Stan. Techniques from these packages will probably make their way into RStanArm eventually (Andrew’s putting in a survival analysis example in the new regression book).
  • Mitzi Morris finished testing the Besag-York-Mollie model in Stan and it passes the Cook-Gelman-Rubin diagnostics. Given that GeoBUGS gets a different answer, we now think it’s wrong, but those tests haven’t completed running yet (it’s much slower than Stan in terms of effective sample size per unit time if you want to get to convergence).
  • Imad Ali has been working with Mitzi on getting the BYM model into RStanArm.
  • Jonah Gabry taught a one-day Stan class in Padua (Italy) while on vacation. That’s how much we like talking about Stan.
  • Ben Goodrich just gave a Stan talk at the Brussels useR conferece group following close on the heels of his Berlin meetup. You can find a lot of information about upcoming events at our events page.

  • Mitzi Morris and Michael Betancourt will be teaching a one-day Stan course for the Women in Machine Learning meetup event in New York on 22 July 2017 hosted by Viacom. Dan Simpson’s comment on the blog post was priceless.
  • Martin Černý improved feature he wrote to implement a standalone function parser for Stan (to make it easier to expose functions in R and Python).
  • Aki Vehtari arXived a new version of the horseshoe prior paper with a parameter to control regularization more tightly, especially for logistic regression. It has the added benefit of being more robust and removing divergent transitions in the Hamiltonian simulation. Look for that to land in RStanArm soon.
  • Charles Margossian continues to make speed improvements on the Stan models for Torsten and is also working on getting the algebraic equation solver into Stan so we can do fixed points of diff eqs and other fun applications. If you follow the link to the pull request, you can also see my extensive review of the code. It’s not easy to put a big feature like this into Stan, but we provide lots of help.
  • Marco Inacio got in a pull request for definite numerical integration. There are needless to say all sorts of subtle numerical issues swirling around integrating. Marco is working from John Cook’s basic implemnetation of numerical integration and John’s been nice enough to offer it under a BSD license so it would be compatible with Stan.
  • Rayleigh Lei is working on vectorizing all the binary functions and has a branch with the testing framework. This is really hairy template programming, but probably a nice break after his first year of grad school at U. Michigan!
  • Allen Riddell and Ari Hartikainen have been working hard on Windows compatibility for PyStan, which is no walk in the park. Windows has been the bane of our existence since starting this project and if all the world’s applied statisticians switched to Unix (Linux or Mac OS X), we wouldn’t shed a tear.
  • Yajuan Si, Andrew Gelman, Rob Trangucci, and Jonah Gabry have been working on a survey weighting module for RStanArm. Sounds like RStanArm’s quickly becoming the swiss army knife (leatherman?) of Bayesian modeling.
  • Andrew Gelman finished a paper on (issues with) NHST and is wondering about clinical effects that are small by design because they’re being compared to the state of the art treatment as a baseline.
  • My own work on mixed mode tests continues apace. The most recent pull request adds logical operators (and, or, not) to our autodiff library (it’s been in Stan—this is just rounding out the math lib operators directly) and removed 4000 lines of old code (replacing it with 1000 new lines, but that includes doc and three operators in both forward and reverse mode). I’m optimistic that this will eventually be done and we’ll have RHMC and autodiff Laplace approximations.
  • Ben Bales submitted a pull request for appending arrays, which is under review and will be generalized to arbitary Stan array types.
  • Ben Bales also submitted a pull request for the initial vectorization of RNGs. This should make programs doing posterior predictive inference so much cleaner.
  • I wrote a functional spec for standalone generated quantities. This would let us do posterior predictive inference after fitting the model. As you can see, even simple things like this take a lot of work. That spec is conservative on a task-by-task basis, but given the correlations among tasks, probably not so conservative in total.
  • I also patched potentially undefined bools in Stan; who knew that C++ would initialize a bool in a class to values like 127. This followed on from Ben Goodrich filing the issue after some picky R compiler flagged some undefined behavior. Not a bug, but the code’s cleaner now.

4 thoughts on “Stan Weekly Roundup, 7 July 2017

  1. I hope I speak for all STAN users who have never contributed to the project but benefit enormously from it: Thank You!

    I will continue my minuscule contribution of saying great things about it and teaching it to my students.

  2. I *really* like these updates! And there’s a whole lot of work going on.

    With the talk of survival models, I wonder if a philosophical discussion regarding *rstanarm* needs to take place. The initial rstanarm package provided an easy way to switch from existing packages by basically appending “stan_” to the beginning of the regression function name. This makes it easy to switch to the Bayesian world, and the Bayesian workflow with across-the-board functions like pp_check. But as rstanarm’s repertoire of routines continues to expand, I wonder if it’s not going to lead to a fragmentation that will be somewhat like a return to the disconnected routines of the frequentist world. (In practice, not of course in terms of the underlying Stan.)

    I’ll admit up front that I’m a huge *brms* fan and one of the things I like about it is that it reflects the Bayesian/Stan idea of an underlying machine that can handle a myriad of tasks. You have a *brm* function and if you choose to use censoring and the Weibull family, you’ve got an AFT survival model. Add in a random effect and you’ve added a frailty component. Add weights to the cases, and so on. All using the same function (brm) and a slightly expanded formula interface.

    I prefer this unified approach, but I’m not saying that rstanarm should suddenly change and adopt it. First, it wouldn’t make sense to have two teams working on the same approach. Second, it would remove one of the advantages of rstanarm, which is the ease of switching from existing, non-Bayesian functions. But I do wonder if there is a philosophy or plan to expanding beyond the current 11 stan_* routines or not. (I’ve tried rewording the question a bunch of times to not sound critical or leading, and to be clearer, but I really don’t have a clear idea in mind, just a fear.)

    • Wayne:

      I agree. For example, why do we have a function called stan_polr() for ordered logistic regression? Just cos someone wrote a function called polr() in R? Weird. There are challenges, though, in integrating functions such ordered logit into stan_glm(). The big issue is extra parameters. Logistic regression just has a bunch of coefs, but other models have additional parameters, and we have to figure out the best way of labeling these.

Leave a Reply

Your email address will not be published. Required fields are marked *