Stan Course in Newcastle, United Kingdom!

(this post is by Betancourt)

The growth of Stan has afforded the core team many opportunities to give courses, to both industrial and academic audiences and at venues  across the world.  Regrettably we’re not always able to keep up with demand for new courses, especially outside of the United States, due to our already busy schedules.  Fortunately, however, some of our colleagues are picking up the slack!

In particular, Jumping Rivers is hosting a two day introductory RStan course at the University of Newcastle in the United Kingdom from Thursday December 7th to Friday December 8th.  The instructor is my good friend Sarah Heaps, who not only is an excellent statistician and avid Stan user but also attended one of the first RStan courses I gave!

If you are on the other side of the Atlantic and interested in learning RStan then I highly recommend attending (and checking out Newcastle’s surprisingly great Chinatown during lunch breaks).

And if you are interested in organizing a Stan course with any members of the core team then don’t hesitate to contact me to see if we might be able to arrange something.

Extended StanCon 2018 Deadline!

(this post is by Betancourt)

We received an ensemble of exciting submissions for StanCon2018, but some of our colleagues requested a little bit of extra time to put the finishing touches on their submissions.  Being the generous organizers that we are, we have decided to extend the submission deadline for everyone by two weeks.

Contributed submissions will be accepted until September 29, 2017 5:00:00 AM GMT (that’s midnight on the east coast of the States, for those who aren’t fans of the meridian time).  We will do our best to review and send decisions out before the early registration deadline, but the sooner you submit the more likely you will hear back before then.  For more details on the submission requirements and how to submit see the Submissions page.

Early registration ends on Friday November 10, 2017, after which registration costs increase significantly.  Registration for StanCon 2018 is in two parts: an initial information form followed by payment and accommodation reservation at the Asilomar website.

The StanCon Cometh

(In a stunning deviation from the norm, this post is not by Andrew or Dan, but Betancourt!)

Some important dates for StanCon2018 are rapidly approaching!

Contributed submissions are due September 16, 2017 5:00:00 AM GMT. That’s less than 6 days away!  We want to make sure we can review submissions early enough to get responses back to submitters in time for early registration.  For more details on the submission requirements and how to submit see the Submissions page.

Speaking of which, early registration ends Friday November 10, 2017 after which the registration cost significantly increases. That’s in just about two months!

Finally, just because I still can’t believe that we have such an amazing ensemble of invited speakers let me remind everyone that attendees will get to see talks from

  • Susan Holmes (Department of Statistics, Stanford University)
  • Frank Harrell (School of Medicine and Department of Biostatistics, Vanderbilt University)
  • Sophia Rabe-Hesketh (Educational Statistics and Biostatistics, University of California, Berkeley)
  • Sean Taylor and Ben Letham (Facebook Core Data Science)
  • Manuel Rivas (Department of Biomedical Data Science, Stanford University)
  • Talia Weiss (Department of Physics, Massachusetts Institute of Technology)

Make Your Plans for Stans (-s + Con)

This post is by Mike

A friendly reminder that registration is open for StanCon 2018, which will take place over three days, from Wednesday January 10, 2018 to Friday January 12, 2018, at the beautiful Asilomar Conference Grounds in Pacific Grove, California.

Detailed information about registration and accommodation at Asilomar, including fees and instructions, can be found on the event website.  Early registration ends on Friday November 10, 2017 and no registrations will be accepted after Wednesday December 20, 2017.

We have an awesome set of invited speakers this year that is worth attendance alone,

  • Susan Holmes (Department of Statistics, Stanford University)
  • Sean Taylor and Ben Letham (Facebook Core Data Science)
  • Manuel Rivas (Department of Biomedical Data Science, Stanford University)
  • Talia Weiss (Department of Physics, Massachusetts Institute of Technology)
  • Sophia Rabe-Hesketh and Daniel Furr (Educational Statistics and Biostatistics, University of California, Berkeley)

Contributed talks will proceed as last year, with each submission consisting of self-contained knitr or Jupyter notebooks that will be made publicly available after the conference.  Last year’s contributed talks were awesome and we can’t wait to see what users will submit this year.  For details on how to submit see the submission website.  The final deadline for submissions is Saturday September 16, 2017 5:00:00 AM GMT.

This year we are going to try to support as many student scholarships  as we can — if you are a student who would love to come but may not have the funding then don’t hesitate to submit a short application!

Finally, we are still actively looking for sponsors!  If you are interested in supporting StanCon 2018, or know someone who might be, then please contact the organizing committee.

Stan/NYC WiMLDS Workshop

On Saturday, July 22nd Mitzi Morris and I (Michael Betancourt) will be hosting a day-long Stan workshop for the NYC Women in Machine Learning & Data Science Meetup Group.  As with most of our workshops the emphasis will be on interactive exercises where everyone builds and running models in Stan.  We’ll start with the foundations of Bayesian inference and end all the way at fitting latent Gaussian process models.  Everything for this course will be in Python and PyStan.

If you’re in the New York City area and want to attend then you can register at the event page.  We hope that you can make it!

P.S.  Don’t forget that StanCon 2018 will take place January 10-12 next year, and those identifying as members of underrepresented communities can take advantage of discounted registration.

StanCon 2018 is live!

This post is by Mike.

We had so much fun at StanCon 2017 that we decided to do it again!

This year’s conference will take place over three days, from Wednesday January 10, 2018 to Friday January 12, 2018, at the beautiful Asilomar Conference Grounds in Pacific Grove, California.  In addition to talks and open discussion, this year we’ll also have dedicated time for collaborative Stan coding with other attendees and the Stan dev team.

Detailed information about registration and accommodation at Asilomar, including fees and instructions, can be found on the event website.  Early registration ends on Friday November 10, 2017 and no registrations will be accepted after Wednesday December 20, 2017.

This year we are going to try to support as many student scholarships  as we can — if you are a student who would love to come but may not have the funding then don’t hesitate to submit a short application!

Contributed talks will proceed as last year, with each submission consisting of self-contained knitr or Jupyter notebooks that will be made publicly available after the conference.  Last year’s contributed talks were awesome and we can’t wait to see what users will submit this year.  For details on how to submit see the submission website.  The final deadline for submissions is Saturday September 16, 2017 5:00:00 AM GMT.

Finally, we are actively looking for sponsors!  If you are interested in supporting StanCon 2018, or know someone who might be, then please contact the organizing committee.

I’ll keep an eye on this post to answer any questions that you might have, and otherwise I hope to see everyone in January!

Working Stiff

After a challenging development process we are happy to announce that Stan finally supports stiff ODE systems, removing one of the key obstacles in fields such as pharmacometrics and ecology.  For the experts, we’ve incorporated CVODE 2.8.2 into Stan and exposed the backward-differentiation formula solver using Newton iterations and a banded Jacobian computed exactly using our autodiff.

Right now the code is available on the develop branches of the Stan library and the CmdStan interface, with more interface support hopefully coming soon.  All you have to do is download develop from GitHub and compile your model — the CVODE library will automatically compile and install.

The new solver is used similarly to the current integrate_ode function in Stan, only you have to explicitly specify the relative and absolute tolerances and the maximum number of steps,

 integrate_ode_cvode(ode, y0, t0, ts, theta, x_real, x_int,
                     rel_tol, abs_tol, max_num_steps)

Be warned that these arguments can have a strong influence on the overall performance of the integrator, so care must be taken in choosing values that ensure accurate solutions. We’ve found that rel_tol = abs_tol = 1e-10 and max_num_steps = 1e8 have worked well for our tests.

If you get a chance to play around with the new solver then do let us know how it performs.  Without your feedback, both positive and negative, we can’t make Stan better!

Stan at JSM2015

In addition to Jigiang’s talk on Stan, 11:25 AM on Wednesday, I’ll also be giving a talk about Hamiltonian Monte Carlo today at 3:20 PM.  Stanimals in attendance can come find me to score a sweet Stan sticker.

IMG_0961

And everyone should check out Andrew’s breakout performance in “A Stan is Born”.

Update: Turns out I missed even more Stan!  There was a great session just this morning, that unfortunately I was not able to post earlier due to some logistical issues (i.e. my inadvertently leaving my laptop behind after my talk yesterday…). Seth will also be talking about his sweet Gaussian processes Tuesday at 10:35 AM.

Stan at NIPS 2014

For those in Montreal a few of the Stan developers will giving talks at the NIPS workshops this week.  On Saturday at 9 AM I’ll be talking about the theoretical foundations of Hamiltonian Monte Carlo at the Riemannian Geometry workshop (http://www.riemanniangeometry2014.eu) while Dan will be talking about Stan at the Software Engineering workshop (https://sites.google.com/site/software4ml/) Saturday afternoon at 4 PM.  We’ll also have an interactive poster at the Probabilistic Programming workshop on Saturday (http://probabilistic-programming.org/wiki/NIPS*2014_Workshop) — it should be an…attractive presentation.

IMG_0614

If you’re up early be sure to check out Matt Hoffman talking first thing on Saturday, at 8:30 AM in the Variational Inference workshop (https://sites.google.com/site/variationalworkshop/).

Dan and I will be around Thursday night and Friday if anyone wants to grab a drink or talk Stan.

Stan Model of the Week: Hierarchical Modeling of Supernovas

The Stan Model of the Week showcases research using Stan to push the limits of applied statistics.  If you have a model that you would like to submit for a future post then send us an email.

Our inaugural post comes from Nathan Sanders, a graduate student finishing up his thesis on astrophysics at Harvard. Nathan writes,

“Core-collapse supernovae, the luminous explosions of massive stars, exhibit an expansive and meaningful diversity of behavior in their brightness evolution over time (their “light curves”). Our group discovers and monitors these events using the Pan-STARRS1 telescope in Hawaii, and we’ve collected a dataset of about 20,000 individual photometric observations of about 80 Type IIP supernovae, the class my work has focused on. While this dataset provides one of the best available tools to infer the explosion properties of these supernovae, due to the nature of extragalactic astronomy (observing from distances
$latex \gtrsim$ 1 billion light years), these light curves typically have much lower signal-to-noise, poorer sampling, and less complete coverage than we would like.

My goal has been to develop a light curve model, with a physically interpretable parameterization, robust enough to fit the diversity of observed behavior and to extract the most information possible from every light curve in the sample, regardless of data quality or completeness.  Because light curve parameters of individual objects are often not identified by the data, we have adopted a hierarchical model structure.  The intention is to capitalize on partial pooling of information to simultaneously regularize the fits of individual light curves and constrain the population level properties of the light curve sample.  The highly non-linear character of the light curves motivates a full Bayes approach to explore the complex joint structure of the posterior.

Sampling from a ~$latex 10^4$ dimensional, highly correlated joint posterior seemed intimidating to me, but I’m fortunate to have been empowered by having taken Andrew’s course at Harvard, by befriending expert practitioners in this field like Kaisey Mandel and Michael Betancourt, and by using Stan!  For me, perhaps the most attractive feature of Stan is its elegant probabilistic modeling language.  It has allowed us to rapidly develop and test a variety of functional forms for the light curve model and strategies for optimization and regularization of the hierarchical structure.  This would not be useful, of course, without Stan’s efficient implementation of NUTS, although the particular pathologies of our model’s posterior drove us to spend a great deal of time exploring divergence, tree depth saturation, numerical instability, and other problems encountered by the sampler.

Over the course of the project, I learned to pay increasingly close attention to the stepsize, n_treedepth and n_divergent NUTS parameters, and other diagnostic information provided by Stan in order to help debug sampling issues.  Encountering saturation of the treedepth and/or extremely small stepsizes often motivated simplifications of the hierarchical structure in order to reduce the curvature in the posterior.  Divergences during sampling led us to apply stronger prior information on key parameters (particularly those that are exponentiated in the light curve model) in order to avoid numerical overflow on samples drawn from the tails.  Posterior predictive checks have been a constant companion throughout, providing a natural means to visualize the model’s performance against the data to understand where failure modes have been introduced – be it through under- or over-constraining priors, inadequate flexibility in the light curve model form, or convergence failure between chains.”

By modeling the hierarchical structure of the supernova measurements Nathan was able to significantly improve the utilization of the data.  For more, see http://arxiv.org/abs/1404.3619.

By modeling the hierarchical structure of the supernova measurements Nathan was able to significantly improve the utilization of the data. For more, see the preprint.

Building and fitting this model proved to be a tremendous learning experience for both Nathan any myself.  We haven’t really seen Stan applied to such deep hierarchical models before, and our first naive implementations proved to be vulnerable to all kinds of pathologies.

A problem early on came in how to model hierarchical dependences
between constrained parameters.  As has become a common theme,
the most successful computational strategy is to model the hierarchical dependencies on the unconstrained latent space and transform to the constrained space only when necessary.

The biggest issue we came across, however, was the development of a well-behaved hierarchal prior with so many layers.  With multiple layers the parameter variances increase exponentially, and the naive generalization of a one-layer prior induces huge variances on the top-level parameters.  This became especially pathological when those top-level parameters are constrained — the exponential function is very easy to overflow in floating point.  Ultimately we established the desired variance on the top-level parameters and worked backwards, scaling the deeper priors by the number of groups in the next layer to ensure the desired behavior.

Another great feature of Stan is that the modeling language also serves as a convenient means of sharing models for reproducible science.  Nathan was able to include the full model as an appendix to his paper, which you can find on the arXiv.