Stan Weekly Roundup, 16 June 2017

We’re going to be providing weekly updates for what’s going on behind the scenes with Stan. Of course, it’s not really behind the scenes, because the relevant discussions are at

  • stan-dev GitHub organization: this is the home of all of our source repos; design discussions are on the Stan Wiki

  • Stan Discourse Groups: this is the home of our user and developer lists (they’re all open); feel free to join the discussion—we try to be friendly and helpful in our responses, and there is a lot of statistical and computational expertise in the wings from our users, who are increasingly joining the discussion. By the way, thanks for that—it takes a huge load off us to get great answers from users to other user questions. We’re up to about 15 or so active discussion threads a day or thereabouts (active topics in the last 24 hours include AR(K) models, web site reorganization, ragged arrays, order statitic priors, new R packages built on top of Stan, docker images for Stan on AWS, and many more!)

OK, let’s get started with the weekly review, though this is a special summer double issue, just like the New Yorker.

Your news here: If you have any Stan news you’d like to share, please let me know at [email protected] (we’ll probably get a more standardized way to do this in the future).

New web site: Michael Betancourt redesigned the Stan web site; hopefully this will be easier to use. We’re no longer trying to track the literature. If you want to see the Stan literature in progress, do a search for “Stan Development Team” or “mc-stan.org” on Google Scholar; we can’t keep up! Do let us know either in an issue on GitHub for the web site or in the user group on Discourse if you have comments or suggestions.

New user and developer lists: We’ve shuttered our Google group and moved to Discourse for both our user and developer lists (they’re consolidated now in categories on one list). It’s easy to signup with GitHub or Google IDs and much easier to search and use online.
See Stan Discourse Groups and for the old discussions, Stan’s shuttered Google group for users and Stan’s shuttered Google group for developers“. We’re not removing any of the old content, but we are prohibiting new posts.

GPU support: Rok Cesnovar and Steve Bronder have been getting GPU support working for linear algebra operations. They’re starting with Cholesky decomposition because it’s a bottleneck for Gaussian process (GP) models and because it has the pleasant property of being quadratic in data and cubic in computation.
See math pull request 529

Distributed computing support: Sebastian Weber is leading the charge into distributed computing using the MPI framework (multi-core or multi-machine) by essentially coding up map-reduce for derivatives inside of Stan. Together with GPU support, distributed computing of derivatives will give us a TensorFlow-like flexibility to accelerate computations. Sebastian’s also looking into parallelizing the internals of the Boost and CVODES ordinary differential equation (ODE) solvers using OpenCL.
See math issue 101, math issue 551,

Logging framework: Daniel Lee added a logging framework to Stan to allow finer-grained control of

Operands and partials: Sean Talts finished the refactor of our underlying operands and partials data structure, which makes it much simpler to write custom derivative functions

See pull request 547

Autodiff testing framework: Bob Carpenter finished the first use case for a generalized autodiff tester to test all of our higher-order autodiff thoroughly
See math pull request 562

C++11: We’re all working toward the 2.16 release, which will be our last release before we open the gates of C++11 (and some of C++14). This is going to make our code a whole lot easier to write and maintain, and will open up awesome possibilities like having closures to define lambdas within the Stan language, as well as consolidating many of our uses of Boost into standard template library.

Append arrays: Ben Bales added signatures for append_array, to work like our appends for vectors and matrices.
See pull request 554 and pull request 550

ODE system size checks: Sebastian Weber pushed a bug fix that cleans up ODE system size checks to avoid seg faults at run time.
See pull request 559

RNG consistency in transformed data: A while ago we relaced the generated-quantities-only nature of _rng functions by allowing them in transformed data (so you can fit fake data generated wholly within Stan or represent posterior uncertainty of some other process, allowing “cut”-like models to be formulated as a two-stage process); Mitzi Morris just cleaned these up so we use the same RNG seed for all chains so that we can perform converence monitoring; multiple replications would then be done by running the whole multi-chain process multiple times.
See Stan pull request 2313

NSF Grant: CI-SUSTAIN: Stan for the Long Run: We (Bob Carpenter, Andrew Gelman, Michael Betancourt) were just awarded an NSF grant for Stan sustainability. This was a follow-on from the first Compute Resource Initiative (CRI) grant we got after building the system. Yea! This adds roughly a year of funding for the team at Columbia University. Our goal is to put in governance processes for sustaining the project as well as shore up all of our unit tests and documentation.

Hiring: We hired two full-time Stan staff at Columbia. Sean Talts joins as a developer at Columbia and Breck Baldwin as a business manager for the project, both at Columbia. Sean had already been working as a contractor for us, hence all the pull requests. (Pro tip: The best way to get a foot in the door for an open-source project is to submit a useful pull request.)

25 thoughts on “Stan Weekly Roundup, 16 June 2017

  1. Wow. I sooooo want that GPU and distributed computing to be all wrapped up in Stan and Rstan so I can just plug a GPU chip into my computer and run everything a zillion times faster.

    I also am looking forward to the language enhancement that allows priors in the declaration phase like this: vector[J] theta ~ normal(0,1);. Also looking forward to having pedantic mode which I hope will trap all sorts of mistakes.

    • No one said anything about running everything a zillion times faster with a GPU. We could do a Cholesky decomposition of a covariance matrix with several hundred rows and columns a good bit faster on a GPU. After that, we could possibly do some other matrix decompositions and maybe matrix multiplication on a GPU. But the matrices have to be pretty large to obtain a noticeable speedup.

  2. None of this would be that hard — we should have a meeting to prioritize our next steps of development. For example, I assume you’ve prioritized getting GMO working over pedantic mode and ading priors to parameters, because that’s what Sean’s working on. I believe Michael objected to writing this in the language

    parameters {
      vector[J] theta ~ normal(0,1);
    ...
    

    I think the reason was that it broke the abstraction of the parameters being variable declarations and the model being where the density is derived. Same objection to having Jacobians implemented within the transformed parameters block. I’m fairly agnostic on these myself, so maybe we should talk about it again.

    This feels more like the dev list than your blog, but I’m happy to talk about Stan wherever.

  3. Sort of a side question, what were the reasons for you choosing to move to Discourse? And this is being asked out of ignorance, not any kind of comment about Discourse. Just we develop some software, and always on the lookout for things that might help, and why people choose the things they do.

  4. Oh. My. God. I nearly fell off the chair when I read GPU/MPI support is incoming. Finally I can experiment with my >20 million data point models!

    • MPI might help if the 20 million data points can be split into conditionally independent groups that take a while to evaluate the likelihood. GPU won’t unless you are doing big Cholesky decompositions and even then the RAM binds at some point.

      • Oh, bummer. My models are quite simple – more or less just hierarchical linear models, but the number of data points is making it take forever.

        • David:

          Yes, but MPI should be cleaner as it’s attacking the full posterior distribution, whereas EP is only an approximation and so is not simulation consistent.

        • Same with GPUs if you have a big matrix-vector multiply where the data’s in the matrix and doesn’t change. When we’ll be able to get something like that going is anyone’s guess. It’s hard to get all this stuff building. If we only had to build a Linux system, we could go much faster than trying to support Windows (the real time sink), Mac OS (still not as easy as Linux because Apple doesn’t believe in backward compatibility), and Linux.

        • I often work with monthly data in Stan because it takes a long time to fit models when the model gets complex. If GPU means that fitting daily models takes about the same time as monthly, then it would be a big win.

  5. I was working on a kaggle contest and was playing with stan on a prediction problem. the slowness of a 150 dim gp and constant seg faults midway through the algorithm caused lot of issues. so yes I am excited with these awesome developments!

    • There should not be any segfaults coming from Stan. If you find one and it’s reproducible, please report a bug. Were you running in CmdStan (what we recommend for large-scale problems) or one of the other interfaces?

  6. It is worth mentioning that the average time complexity of Cholesky decomposition can be improved for band-limited matrices; the algorithms are in Golub and Van Loan’s Matrix Computations text and applied in H. Rue ‘Fast Sampling for GMRFs with Applications’.

    • Absolutely. We’re also working on structured spatial and time-series models. Lots of ways to make things go faster when you know a bit about the structure of a problem. But it’s a constant struggle with trying to tune for special cases versus tuning the general case to go faster.

      • Yea, makes sense — it’s hard to build in every possible optimization a priori so general purpose methods can be useful. Looking forward to what the team comes up with, especially with regards to parallelization.

  7. I’ve only ever been taught frequentist, and I want to learn Bayes. I’ve played with BayesFactor before, but I feel like that is just scratching the surface, and it feels too much like frequentist to me (p = .05 becomes BF = 3, etc.)

    I have a lot of experience in R, both for statistics and programming. I’ve taken grad classes in ANOVA, regression, SEM, meta-analysis, factor analysis, multilevel modeling, and longitudinal data analysis. Problem: I’m a social psychology PhD student (although focusing in quantitative methods and statistics), so I have no background in calculus.

    I’ve read papers on the concepts behind Bayes, so I get that part. Now, I want to be able to use it in future jobs (especially since I’m interested in politics, and I feel like many people are using Bayesian models for polling and elsewhere in industry).

    What would people recommend would be the best way for me to learn Bayesian statistics (not only conceptually and mathematically, but how to do them with Stan), given my background? I’m in a committed relationship with R, so I already know I’d prefer Rstan.

    Thanks!

    • I’d recommend McElreath’s Statistical Rethinking as a great way of getting used to Bayes + Stan. He starts with the assumption that you’ve heard of this thing called probability, and by the end, has you fitting varying slopes/gaussian process models. Plus, since he’s a quantitative anthropologist, he understands his audience might be turned off by integrals, and discusses how to do things graphically.

Leave a Reply to Giri Cancel reply

Your email address will not be published. Required fields are marked *