The Stan Model of the Week showcases research using Stan to push the limits of applied statistics. If you have a model that you would like to submit for a future post then send us an email.
Our inaugural post comes from Nathan Sanders, a graduate student finishing up his thesis on astrophysics at Harvard. Nathan writes,
“Core-collapse supernovae, the luminous explosions of massive stars, exhibit an expansive and meaningful diversity of behavior in their brightness evolution over time (their “light curves”). Our group discovers and monitors these events using the Pan-STARRS1 telescope in Hawaii, and we’ve collected a dataset of about 20,000 individual photometric observations of about 80 Type IIP supernovae, the class my work has focused on. While this dataset provides one of the best available tools to infer the explosion properties of these supernovae, due to the nature of extragalactic astronomy (observing from distances
1 billion light years), these light curves typically have much lower signal-to-noise, poorer sampling, and less complete coverage than we would like.
My goal has been to develop a light curve model, with a physically interpretable parameterization, robust enough to fit the diversity of observed behavior and to extract the most information possible from every light curve in the sample, regardless of data quality or completeness. Because light curve parameters of individual objects are often not identified by the data, we have adopted a hierarchical model structure. The intention is to capitalize on partial pooling of information to simultaneously regularize the fits of individual light curves and constrain the population level properties of the light curve sample. The highly non-linear character of the light curves motivates a full Bayes approach to explore the complex joint structure of the posterior.
Sampling from a ~ dimensional, highly correlated joint posterior seemed intimidating to me, but I’m fortunate to have been empowered by having taken Andrew’s course at Harvard, by befriending expert practitioners in this field like Kaisey Mandel and Michael Betancourt, and by using Stan! For me, perhaps the most attractive feature of Stan is its elegant probabilistic modeling language. It has allowed us to rapidly develop and test a variety of functional forms for the light curve model and strategies for optimization and regularization of the hierarchical structure. This would not be useful, of course, without Stan’s efficient implementation of NUTS, although the particular pathologies of our model’s posterior drove us to spend a great deal of time exploring divergence, tree depth saturation, numerical instability, and other problems encountered by the sampler.
Over the course of the project, I learned to pay increasingly close attention to the stepsize, n_treedepth and n_divergent NUTS parameters, and other diagnostic information provided by Stan in order to help debug sampling issues. Encountering saturation of the treedepth and/or extremely small stepsizes often motivated simplifications of the hierarchical structure in order to reduce the curvature in the posterior. Divergences during sampling led us to apply stronger prior information on key parameters (particularly those that are exponentiated in the light curve model) in order to avoid numerical overflow on samples drawn from the tails. Posterior predictive checks have been a constant companion throughout, providing a natural means to visualize the model’s performance against the data to understand where failure modes have been introduced – be it through under- or over-constraining priors, inadequate flexibility in the light curve model form, or convergence failure between chains.”
Building and fitting this model proved to be a tremendous learning experience for both Nathan any myself. We haven’t really seen Stan applied to such deep hierarchical models before, and our first naive implementations proved to be vulnerable to all kinds of pathologies.
A problem early on came in how to model hierarchical dependences
between constrained parameters. As has become a common theme,
the most successful computational strategy is to model the hierarchical dependencies on the unconstrained latent space and transform to the constrained space only when necessary.
The biggest issue we came across, however, was the development of a well-behaved hierarchal prior with so many layers. With multiple layers the parameter variances increase exponentially, and the naive generalization of a one-layer prior induces huge variances on the top-level parameters. This became especially pathological when those top-level parameters are constrained — the exponential function is very easy to overflow in floating point. Ultimately we established the desired variance on the top-level parameters and worked backwards, scaling the deeper priors by the number of groups in the next layer to ensure the desired behavior.
Another great feature of Stan is that the modeling language also serves as a convenient means of sharing models for reproducible science. Nathan was able to include the full model as an appendix to his paper, which you can find on the arXiv.