Stan Model of the Week: Hierarchical Modeling of Supernovas

The Stan Model of the Week showcases research using Stan to push the limits of applied statistics.  If you have a model that you would like to submit for a future post then send us an email.

Our inaugural post comes from Nathan Sanders, a graduate student finishing up his thesis on astrophysics at Harvard. Nathan writes,

“Core-collapse supernovae, the luminous explosions of massive stars, exhibit an expansive and meaningful diversity of behavior in their brightness evolution over time (their “light curves”). Our group discovers and monitors these events using the Pan-STARRS1 telescope in Hawaii, and we’ve collected a dataset of about 20,000 individual photometric observations of about 80 Type IIP supernovae, the class my work has focused on. While this dataset provides one of the best available tools to infer the explosion properties of these supernovae, due to the nature of extragalactic astronomy (observing from distances
$latex \gtrsim$ 1 billion light years), these light curves typically have much lower signal-to-noise, poorer sampling, and less complete coverage than we would like.

My goal has been to develop a light curve model, with a physically interpretable parameterization, robust enough to fit the diversity of observed behavior and to extract the most information possible from every light curve in the sample, regardless of data quality or completeness.  Because light curve parameters of individual objects are often not identified by the data, we have adopted a hierarchical model structure.  The intention is to capitalize on partial pooling of information to simultaneously regularize the fits of individual light curves and constrain the population level properties of the light curve sample.  The highly non-linear character of the light curves motivates a full Bayes approach to explore the complex joint structure of the posterior.

Sampling from a ~$latex 10^4$ dimensional, highly correlated joint posterior seemed intimidating to me, but I’m fortunate to have been empowered by having taken Andrew’s course at Harvard, by befriending expert practitioners in this field like Kaisey Mandel and Michael Betancourt, and by using Stan!  For me, perhaps the most attractive feature of Stan is its elegant probabilistic modeling language.  It has allowed us to rapidly develop and test a variety of functional forms for the light curve model and strategies for optimization and regularization of the hierarchical structure.  This would not be useful, of course, without Stan’s efficient implementation of NUTS, although the particular pathologies of our model’s posterior drove us to spend a great deal of time exploring divergence, tree depth saturation, numerical instability, and other problems encountered by the sampler.

Over the course of the project, I learned to pay increasingly close attention to the stepsize, n_treedepth and n_divergent NUTS parameters, and other diagnostic information provided by Stan in order to help debug sampling issues.  Encountering saturation of the treedepth and/or extremely small stepsizes often motivated simplifications of the hierarchical structure in order to reduce the curvature in the posterior.  Divergences during sampling led us to apply stronger prior information on key parameters (particularly those that are exponentiated in the light curve model) in order to avoid numerical overflow on samples drawn from the tails.  Posterior predictive checks have been a constant companion throughout, providing a natural means to visualize the model’s performance against the data to understand where failure modes have been introduced – be it through under- or over-constraining priors, inadequate flexibility in the light curve model form, or convergence failure between chains.”

By modeling the hierarchical structure of the supernova measurements Nathan was able to significantly improve the utilization of the data.  For more, see http://arxiv.org/abs/1404.3619.

By modeling the hierarchical structure of the supernova measurements Nathan was able to significantly improve the utilization of the data. For more, see the preprint.

Building and fitting this model proved to be a tremendous learning experience for both Nathan any myself.  We haven’t really seen Stan applied to such deep hierarchical models before, and our first naive implementations proved to be vulnerable to all kinds of pathologies.

A problem early on came in how to model hierarchical dependences
between constrained parameters.  As has become a common theme,
the most successful computational strategy is to model the hierarchical dependencies on the unconstrained latent space and transform to the constrained space only when necessary.

The biggest issue we came across, however, was the development of a well-behaved hierarchal prior with so many layers.  With multiple layers the parameter variances increase exponentially, and the naive generalization of a one-layer prior induces huge variances on the top-level parameters.  This became especially pathological when those top-level parameters are constrained — the exponential function is very easy to overflow in floating point.  Ultimately we established the desired variance on the top-level parameters and worked backwards, scaling the deeper priors by the number of groups in the next layer to ensure the desired behavior.

Another great feature of Stan is that the modeling language also serves as a convenient means of sharing models for reproducible science.  Nathan was able to include the full model as an appendix to his paper, which you can find on the arXiv.

11 thoughts on “Stan Model of the Week: Hierarchical Modeling of Supernovas

  1. ” the modeling language also serves as a convenient means of sharing models for reproducible science”

    Nice, except the Devil likes to hide in the 1,000+ lines of code cleaning up the data.

    • But that’s the fundamental limit of modern science. When data are released publicly it’s not the raw data but the processed data, and that often comes down to necessity as the various calibrations require expert level knowledge of the detectors. It would take a long time to understand the raw data, even for an expert in the field but not the particular experiment.

      It’s not ideal, but it’s progress.

    • What exactly does it mean that ” the modeling language also serves as a convenient means of sharing models for reproducible science”?

      Aren’t all models, irrespective of language, capable of performing that function?

      • Yes, but rarely are the full models included in the publications; this is especially true when any of the code is custom. With Stan the model, the runtime configuration, and the version number can all be included so that the analysis can be rerun exactly, even if only be other members of the team.

        • Doesn’t sound like anything special to me. I release full models with publications all the time. There is no hinderance to releasing version information (e.g., in R with sessionInfo()). I guess what you mean is that people who write custom code (in C or C++ or whatever) for Bayesian analysis don’t release it (they should, though).

        • People are even refusing to release simple information and code about their data transformations and standard linear regression modeling, even if it is just done by some simple Recode and Compute commands in SPSS. I usually don’t even ask any more for it (in the Social Sciences) as Openness is not considered any standard at all.

  2. This is very interesting to me (both this work as well as the use of Stan), esp. given some work in NL Dynamics in biophys where I’m hitting a bit of a wall, where a hierarchical model might be useful. Guess it’s time to clone yet another repo :). Just for the graphic, what did you use to output the model schema? I don’t have a charting tool (Dia, etc) that allows me to singularly render the graphic with math text inside each object (or at least super/subscripting). This was nicely readable, even with all of the parameter definitions.

    Also, I’m curious why, for background flux, there isn’t consideration for photometric effect in addition to those specific to the SN. From the observer’s perspective, wouldn’t that be another 2nd level dependency?

    How many folks are contributing to the Stan project at the moment?

  3. Pingback: Stan Model of the Week: Hierarchical Modeling o...

Leave a Reply

Your email address will not be published. Required fields are marked *