Stan Weekly Roundup, 30 June 2017

TM version of logoHere’s some things that have been going on with Stan since the last week’s roundup

  • Stan® and the logo were granted a U.S. Trademark Registration No. 5,222,891 and a U.S. Serial Number: 87,237,369, respectively. Hard to feel special when there were millions of products ahead of you. Trademarked names are case insensitive and they required a black-and-white image, shown here.

  • Peter Ellis, a data analyst working for the New Zealand government, posted a nice case study, State-space modelling of the Australian 2007 federal election. His post is intended to “replicate Simon Jackman’s state space modelling [from his book and pscl package in R] with house effects of the 2007 Australian federal election.”

  • Masaaki Horikoshi provides Stan programs on GitHub for the models in Jacques J.F. Commandeur and Siem Jan Koopman’s book Introduction to State Space Time Series Analysis.

  • Sebastian Weber put out a first draft of the MPI specification for a map function for Stan. Mapping was introduced in Lisp with maplist(); Python uses map() and R uses sapply(). The map operation is also the first half of the parallel map-reduce pattern, which is how we’re implmenting it. The reduction involves fiddling the operands, result, and gradients into the shared autodiff graph.

  • Sophia Rabe-Hesketh, Daniel Furr, and Seung Yeon Lee, of UC Berkeley, put together a page of Resources for Stan in educational modeling; we only have another partial year left on our IES grant with Sophia.
  • Bill Gillespie put together some introductory Stan lectures. Bill’s recently back from teaching Stan at the PAGE conference in Budapest.
  • Mitzi Morris got her pull request merged to add compound arithmetic and assignment to the language (she did the compound declare/define before that). That means we’ll be able to write foo[i, j] += 1 instead of foo[i, j] = foo[i, j] + 1 going forward. It works for all types where the binary operation and assignment are well typed.
  • Sean Talts has the first prototype of Andrew Gelman’s algorithm for max marginal modes—either posterior or likelihood. This’ll give us the same kind of maximum likelihood estimates as Doug Bates’s packages for generalized linear mixed effects models, lme4 in R and MixedModels.jl in Julia. It not only allows penalities or priors like Vince Dorie’s and Andrew’s R package blme, but it can be used for arbitrary parameters subsets in arbitrary Stan models. It shares some computational tricks for stochastic derivatives with Alp Kucukelbir’s autodiff variational inference (ADVI) algorithm.
  • I got the pull request merged for the forward-mode test framework. It’s cutting down drastically on code size and improving test coverage. Thanks to Rob Trangucci for writing the finite diff functionals and to Sean Talts and Daniel Lee for feedback on the first round of testing. This should mean that we’ll have higher-order autodiff exposed soon, which means RHMC and faster autodiffed Hessians.

HMMs in Stan? Absolutely!

I was having a conversation with Andrew that went like this yesterday:

Andrew:
Hey, someone’s giving a talk today on HMMs (that someone was Yang Chen, who was giving a talk based on her JASA paper Analyzing single-molecule protein transportation experiments via hierarchical hidden Markov models).

Maybe we should add some specialized discrete modules to Stan so we can fit HMMs. Word on the street is that Stan can’t fit models with discrete parameters.

Me:
Uh, we can already fit HMMs in Stan. There’s a section in the manual that explains how (section 9.6, Hidden Markov Models).

We’ve had some great traffic on the mailing list lately related to movement HMMs, which were all the rage at ISEC 2016 for modeling animal movement (e.g., whale diving behavior in the presence or absence of sonar).

Andrew:
But how do you marginalize out the discrete states?

Me:
The forward algorithm (don’t even need the backward part for calculating HMM likelihoods). It’s a dynamic programming algorithm that calculates HMM likelihoods in $latex \mathcal{O}(N \, K^2)$, where $latex N$ is the number of time points and $latex K$ the number of HMM states.

Andrew:
So you built that in C++ somehow?

Me:
Nope, just straight Stan code. It’s hard to generalize to C++ given that the state emissions models vary by application and the transitions can involve predictors. Like many of these classes of models, Stan’s flexible enough to, say, implement all of the models described in Roland Langrock et al.’s moveHMM package for R. The doc for that package is awesome—best intro the models I’ve seen. I like the concreteness of resolving details in code. Langrock’s papers are nice, too, so I suspect his co-authored book on HMMs for time series is probably worth reading, too. That’s another one that’d be nice to translate to Stan.

More HMMs in Stan coming soon (I hope)

Case study on movement HMMs coming soon, I hope. And I’ll pull out the HMMs tuff into a state-space chapter. Right now, it’s spread between the CJS ecology models (states are which animals are alive) in the latent discrete parameters chapters and HMMs in the time-series chapter. I’m also going to try to get some of the more advanced sufficient statistics, N-best, and chunking algorithms I wrote for LingPipe going in Stan along with some semi-supervised learning methods I used to work on in NLP. I’m very curious to compare the results in Stan to those derived by plugging in (penalized) maximum likelihood parameter estimates

Getting working movement HMM code

Ben Lambert and I managed to debug the model through a mailing list back and forth, which also contains code for the movement HMMs (which also use a cool wrapped Cauchy distribution that Ben implemented as a function). The last word on that thread has the final working models with covariates.

P.S.

We should be able to fit Yang Chen’s hierarchical HMMs in Stan. She assumed a univariate normal emission density that was modeled hierarchically across experiments, with a shared transition matrix involving only a few states. We wouldn’t normally perfrom her model selection step; see Andrew’s previous post on overfitting cluster analysis which arose out of a discussion I had with Ben Bolker at ISEC.