Skip to content
 

Stan Weekly Roundup, 7 September 2017

I was out on vacation last week, but now I’m back! While I was gone…

  • Sean Talts released Stan 2.17 (the math library, the core Stan library, and CmdStan 2.17). RStan and PyStan are in the works. Stan 2.17 will be the last pure C++03 release, that opens up pretty much all of C++11 and some of C++14 for future releases.

  • Matthijs Vàkàr has finished the binomial-logit GLM function, which is about three times faster than buidling it out of primitives; that’ll be merged soon and then we can do the other common GLM link/likelihood combos. Compound functions like this let us do symbolic reductions of the function and derivatives as well as reduce the size of the overall expression graph.

  • Ben Bales finished the first pull request for vectorized RNGs, so they should all be vectorized soon, making it much easier to do posterior predictive checks that match likelihoods.

  • Ben Bales (we have two Dans, two Bens, and four Bobs/Robs/Roberts in the guts of Stan) also merged in a generalized append_array function for Stan and the math library.

  • Rayeleigh Lei added a standard-normal distribution with no arguments which will be much more efficient than using normal(0, 1) because it removes some arithmetic (subtracting zero, dividing by one) and also because it converts a bunch of additions into multiplies; we should be making updates to make some of the arithmetic in vectorized distributions faster. (Rayleigh’s vectorization of binary functions is nearly ready to merge. Grad students should get more summer vacation—it’s good for Stan development!)

  • Mitzi Morris added a data qualifier for function arguments that will allow us to soundly call ODE integration and the algebraic solver, both of which have data-only arguments.

  • Michael Betancourt is organizing a physics workshop on Stan at MIT coming up late September I think; I don’t know how open it is to the public, but the upshot should be a really neat intro to Bayesian stats using physics examples; Sean is helping out on the teaching effort.

  • Michael also added a new wiki page on how you can contribute to Stan without C++ experience.

Time out for a whole heap of papers and theses to think about

  • Michael Betancourt announced that the model of malaria for interacting vaccines is finally out:

  • Maria Gorinova sent along a link to her MS thesis in which she develops a new syntax for a Stan-like language that’s eerily like what I was discussing on Discourse:

    I have to read this more closely, as she gets away with even stripping out explicit parameter declarations. Although I’ve come to expect it now, I’m still blown away when two independent lines of research come out so similarly. I guess the point is that they’re not really independent—we both started with Stan and presumably had the goal of turning it into something that looked more like Edward. Maria managed to get closer than I did, so I have to figure out how she did it.

  • Michael also posted an arXiv paper with Andrew Gelman and Dan Simpson on

    There was also a full blog post on that. Lots of material to think about here.

  • Jonah Gabry, along with Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman posted a different arXiv paper,

    Dan also wrote a full blog post on this paper. Dan seems to have sparked the completion of a lot of these papers that have been kicking around in meetings and on blackboards for a while. Go, Dan!

  • Michael Andreae also has a paper with Ben Goodrich and Jonah Gabry and others using Stan,
    • Andreae MH, Nair S, Gabry JS, Goodrich B, Hall C, Shaparin N. A pragmatic trial to improve adherence with scheduled appointments in an inner-city pain clinic by human phone calls in the patient’s preferred language. J Clin Anesth. 2017 Aug 22;42:77-83. doi: 10.1016/j.jclinane.2017.08.014. [Epub ahead of print] PubMed PMID: 28841451.

  • Andrew and Aki also mentioned something about an R-squared paper and horseshoe priors, but I didn’t catch the context or status. Andrew, Michael and I are also putting togetehr a split R-hat paper (which started out small, but under Michael’s direction is expanding foundationally); this one should be out soon.
  • R^2 paper going on

OK, back to the regularly scheduled Stan-related roundup.

  • Sebastian Weber is continuing to lead the MPI effort; we had a big meeting with me, Daniel Lee, Sean Talts, and Charles Margossian. Complete prototypes now exist on a branch and the speedup from a simplified interface are remains nearly linear in number of CPUs (either on a single box or on a fast Infiniband network). This should actually land soon for Linux—we have no idea how we’ll be able to manage Windows or Mac support, but they may follow.

  • Steve Bronder has a pull request nearly ready to go for offloading Cholesky decompositions to the GPU; this is just the tip of the iceberg on GPU support. Now that Sebastian’s figured out (as part of working on MPI) a sane way to ship data once in a way that makes sense from the language, a whole bunch more optimizations will be opening up soon. The beautiful part about this is that it’s an orthogonal speedup to MPI; so we’ll be able to soon distribute likelihoods over a cluster of GPU-endabled machines. Should be nearly a 100-fold speedup for problems that involve a lot of matrix operations in the likelihood and lots of data on a cluster of a dozen machines with GPUs.

  • Sean and Daniel Lee have also been whipping our makefiles into shape for C++11 and making them more standards compliant, as we’re getting many more users trying to use our makefiles, build docker containers, etc.

  • Sean is also whipping our testing platforms into shape; Travis is turning into a bottleneck for releases and just for general testing, so we’re adding more compute power to our local Jenkins setup. We may wind up having to abandon Travis altogether due to its timing limitations and our extensive unit testing and heavy compile-time loads.

  • A big team effort was required to upgrade to the latest Boost (1.64); our last version was 1.62 and a whole lot of rugs were pulled out from under us in 1.64. We spend way too much time doing this kind, but I don’t see how to avoid it. It’s not like we’re using some kind of hacked private interfaces—we’re just using the public APIs. This is also why you always want to minimized dependencies. We only depend on Boost and Eigen (and a whole bunch of platforms and compilers), and it’s already painful to keep up so that we can support multiple versions of the Eigen and Boost lib.

If anyone wonders how I’m ordering these things, I’m trying to put completed work before work in progress and also highlight new contributors. Beyond that, it’s just what I think most people will be interested in and what order I record things during the meeting!

4 Comments

  1. John Hall says:

    I love these weekly roundups. Keep ’em comin’!

  2. pau says:

    You guys rock my world. It’d be great to have vectorized truncated distributions.

  3. Shravan says:

    Here at Potsdam, we also made some progress in Stan-related work:

    1. Summer school in Statistical Methods for Linguistics and Psychology. Next iteration is 10-14 Sept 2018.

    2. Coming up: one day workshop on Bayesian modeling using Stan, Tuebingen, Germany.

    3. Bruno Nicenboim’s StanCon 2017 talk is now finally published as this awesome paper in the Journal of Memory and Language. Do not miss his cool visualisations (HT Dan Simpson).

Leave a Reply