Chris Fonnesbeck contributed our first PyStan case study (I wrote the abstract), in the form of a very nice Jupyter notebook. Daniel Lee and I had the pleasure of seeing him present it live as part of a course we were doing at Vanderbilt last week.

**A Primer on Bayesian Multilevel Modeling using PyStan**

This case study replicates the analysis of home radon levels using hierarchical models of Lin, Gelman, Price, and Kurtz (1999). It illustrates how to generalize linear regressions to hierarchical models with group-level predictors and how to compare predictive inferences and evaluate model fits. Along the way it shows how to get data into Stan using pandas, how to sample using PyStan, and how to visualize the results using Seaborn.

- view HTML (mc-stan.org)

- source repository (GitHub)

As an added bonus, if you follow the link to the source repo on GitHub, you’ll find a Gaussian process case study. I haven’t even had time to look at it yet, but if it’s as nice as this radon study, it’ll be well worth checking out.

P.S. If you’re wondering what one of the core PyMC developers was doing writing PyStan examples, it was because he invited us to teach a course on RStan at Vanderbilt to his biostatistics colleagues who didn’t want to learn Python. It was extremely generous of him to put promoting good science ahead of promoting his own software! Part of our class was on teaching Bayesian methods and how to code models in Stan, and Chris offered to do some case studies, which is what Andrew usually does when he’s the third instructor. Chris said he tried RStan, but then bailed and went back to Python where he could use familiar and powerful Python tools like pandas and numpy and seaborn. It’s hard to motivate learning a whole new language and toolchain just to write one example. The benefit to us is that we now have a great PyStan example. Thanks, Chris!

The github site also links to http://mc-stan.org/workshops/vanderbilt2016/ which has a bunch of great slides.

This is great to see. I do have one suggestion, which is to generate some maps of things like the predicted county geometric mean concentrations, the probability that more than 5% of homes in the county have radon concentration over the EPA’s recommended “action level”, and so on. People like maps! Plus, this would help to show what you can do once you have a bunch of draws from the posterior distribution.

On request: shameless self-promotion of using a map to present meaningful summaries of samples from a posterior distribution while gently abusing the concept of “real-time”: https://www.overleaf.com/read/hcqdktdnxnvd

awesome, thanks for this