BridgeStan: Stan model log densities, gradients, Hessians, and transforms in R, Python, and Julia

We’re happy to announce the official 1.0.0 release of BridgeStan.

What is BridgeStan?

From the documentation:

BridgeStan provides efficient in-memory access through Python, Julia, and R to the methods of a Stan model, including log densities, gradients, Hessians, and constraining and unconstraining transforms.

BridgeStan should be useful for developing algorithms and deploying applications. It connects to R and Python through low-level foreign function interfaces (.C and ctypes, respectively) and is thus very performant and portable. It is also easy to install and keep up to date with Stan releases. BridgeStan adds the model-level functionality from RStan/PyStan that is not implemented in CmdStanR/CmdStanPy.

Documentation and source code

Detailed forum post

Here’s a post on the Stan forums with much more detail:

Among other things, it describes the history and relations to other Stan-related projects. Edward Roualdes started the project in order to access Stan models through Julia, then Brian Ward and I (mostly Brian!) helped Edward finish it, with some contributions from Nicholas Siccha and Mitzi Morris.

Stan downtown intern posters: scikit-stan & constraining transforms

It’s been a happening summer here at Stan’s downtown branch at the Flatiron Institute. Brian Ward and I advised a couple of great interns. Two weeks or so before the end of the internship, our interns present posters. Here are the ones from Brian’s intern Alexey and my intern Meenal.

Alexey Izmailov: scikit-stan

Alexey built a version of the scikit-learn API backed by Stan’s sampling, optimization, and variational inference. It’s plug and play with scikit.learn.

Meenal Jhajharia: unconstraining transforms

Meenal spent the summer exploring constraining transforms and how to evaluate them with a goal toward refining Stan’s transform performance and to add new data structures. This involved both figuring out how to evaluate them (vs. target distributions w.r.t. convexity, condition if convex, and sampling behavior in the tail, body, and near the mode of target densities). Results are turning out to be more interesting than we suspected in that different transforms seem to work better under different conditions. We’re also working with Seth Axen (Tübingen) and Stan devs Adam Haber and Sean Pinkney.

They don’t make undergrads like they used to

Did I mention they were undergrads? Meenal’s heading back to University of Delhi to finish her senior year and Alexey heads back to Brown to start his junior year! The other interns at the Center for Computational Mathematics, many of whom were undergraduates, have also done some impressive work in everything from using normalizing flows to improve sampler proposals for molecular dynamics to building 2D surface PDE solvers at scale to HPC for large N-body problems. In this case, not making undergrads like they used to is a good thing!

Hiring for next summer

If you’re interested in working on statistical computing as an intern next summer, drop me a line at [email protected]. I’ll announce when applications are open here on the blog.

 

Summer internships at Flatiron Institute’s Center for Computational Mathematics

[Edit: Sorry to say this to everyone, but we’ve selected interns for this summer and are no longer taking applications. We’ll be taking applications again at the end of 2022 for positions in summer 2023.]

We’re hiring a crew of summer interns again this summer. We are looking for both undergraduates and graduate students. Here’s the ad.

I’m afraid the pay is low, but to make up for it, we cover travel, room, and most board (3 meals/day, 5 days/week). Also, there’s a large cohort of interns every summer across the five institutes at Flatiron (biology, astrophysics, neuroscience, quantum physics, and math), so there are plenty of peers with whom to socialize. Another plus is that we’re in a great location, on Fifth Avenue just south of the Flatiron Building (in the Flatiron neighborhood, which is a short walk to NYU in Greenwich Village and Google in Chelsea as well as to Times Square and the Hudson River Park).

If you’re interested in working on stats, especially applied Bayesian stats, Bayesian methodology, or Stan, please let me know via email at [email protected] so that I don't miss your application. We have two other Stan devs here, Yuling Yao (postdoc) and Brian Ward (software engineer).

We're also hiring full-time permanent research scientists at both the junior level and senior level, postdocs, and software engineers. For more on those jobs, see my previous post on jobs at Flatiron. That post has lots of nice photos of the office, which is really great. Or check out Google's album of photos.

Naming conventions for variables, functions, etc.

The golden rule of code layout is that code should be written to be readable. And that means readable by others, including you in the future.

Three principles of naming follow:

1. Names should mean something.

2. Names should be as short as possible.

3. Use your judgement to balance (1) and (2).

The third one’s where all the fun arises. Do we use “i” or “n” for integer loop variables by convention? Yes, we do. Do we choose “inv_logit” or “inverse_logit”? Stan chose “inv_logit”. Do we choose “complex” or “complex_number”? C++ chose “complex”, as well as choosing “imag” over “imaginary” for the method to pull the imaginary component out.

Do we use names like “run_helper_function”, which is both long and provides zero clue as to what it does? We don’t if we want to do unto others as we’d have them do unto us.

P.S. If the producers of Silicon Valley had asked me, Winnie would’ve dumped Richard after a fight about Hungarian notation, not tabs vs. spaces.

A Primer on Bayesian Multilevel Modeling using PyStan

Chris Fonnesbeck contributed our first PyStan case study (I wrote the abstract), in the form of a very nice Jupyter notebook. Daniel Lee and I had the pleasure of seeing him present it live as part of a course we were doing at Vanderbilt last week.

A Primer on Bayesian Multilevel Modeling using PyStan

This case study replicates the analysis of home radon levels using hierarchical models of Lin, Gelman, Price, and Kurtz (1999). It illustrates how to generalize linear regressions to hierarchical models with group-level predictors and how to compare predictive inferences and evaluate model fits. Along the way it shows how to get data into Stan using pandas, how to sample using PyStan, and how to visualize the results using Seaborn.

As an added bonus, if you follow the link to the source repo on GitHub, you’ll find a Gaussian process case study. I haven’t even had time to look at it yet, but if it’s as nice as this radon study, it’ll be well worth checking out.


P.S. If you’re wondering what one of the core PyMC developers was doing writing PyStan examples, it was because he invited us to teach a course on RStan at Vanderbilt to his biostatistics colleagues who didn’t want to learn Python. It was extremely generous of him to put promoting good science ahead of promoting his own software! Part of our class was on teaching Bayesian methods and how to code models in Stan, and Chris offered to do some case studies, which is what Andrew usually does when he’s the third instructor. Chris said he tried RStan, but then bailed and went back to Python where he could use familiar and powerful Python tools like pandas and numpy and seaborn. It’s hard to motivate learning a whole new language and toolchain just to write one example. The benefit to us is that we now have a great PyStan example. Thanks, Chris!