Stantastic!

Posted on November 16, 2012 9:53 AM by Andrew

Richard McElreath writes:

I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process.

The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values.

Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it all quite reliably. But JAGS was taking a long time to converge and then producing highly autocorrelated output. Stan has been amazing, in comparison. I could hardly believe the traceplots. Stan produces the same inferences as my JAGS code does, but with 8 hour runs (no thinning needed) instead of 30 hour runs (with massive thinning).

In the future, I should be getting similar data for about a dozen other foraging populations, so will want to scale this up to a meta-analytic level, with partial pooling across societies. So the improved efficiency from Stan will be a huge help going forward as well.

On the horizon, I have a harder project I’d like to port into Stan, involving cumulative multi-normal likelihoods. I wrote my own sampler, using likelihoods from pmvnorm in the mvtnorm package, but it mixes very slowly, once all the varying effects are included. Is there a clever way to get the same likelihoods in Stan yet? If not, once you have a guide prepared for how to compile in new distributions, I can probably use that to hack mvtnorm’s pmvnorm into Stan.

I’m pretty sure that with some vectorization and other steps, he can get his model to run in much less than 8 hours in Stan. But I’m happy to see that even an inefficient implementation is working well.

And Lucas Leeman writes:

I just wanted to say thank you for Stan! Thank you and your collaborators very much!

I had this problem with a very slow mixing chain and I have finally managed to get Stan to do what I want. With the mock example I am playing Stan drastically outperforms the software I was using before.

In announcing this progress, I am not trying in any way to disparage Bugs or Jags. The success of these earlier programs is what inspired us to develop Stan. A few years ago, I had the attitude that I could fit a model in Bugs, and if that didn’t work I could program it myself. Now there’s Stan. Fitting a model in Stan is essentially the same as programming it myself, except that the program has already been optimized and debugged, thus combining the convenience of Bugs with the efficiency of compiled code.

Also, again we thank the Department of Energy, Institute for Education Sciences, and National Science Foundation for partial support of this project.

5 thoughts on “Stantastic!”

Richard McElreath on November 16, 2012 12:46 PM at 12:46 pm said:

Indeed, I have gotten the Stan model to run even faster, since I wrote that. Bob’s suggestion to split the data into zeros and positive reals, and then sample them in different loops, was the best structural improvement. One of the things I like about working in Stan is getting away from hacks like the “ones trick” and just thinking in terms of log probability.

Along the way, I ended up writing an R package (“glmer2stan”) to quickly define simple GLMM Stan models. It takes lmer/glmer style formulas and builds corresponding Stan code and data/init lists. For those just starting with Stan, it might be a quick way to get familiar with the differences relative to JAGS/BUGS. I plan to use it in teaching and consulting. People are welcome to grab it off my development repository, if they are feeling adventurous. Use this code within R to install it:

options(repos=c(getOption(“repos”), glmer2stan=”http://xcelab.net/R”))
install.packages(“glmer2stan”,type=”source”)

It has man pages, so do a ?glmer2stan before wading in.
- Dan on November 17, 2012 3:35 PM at 3:35 pm said:
  
  R does not like the curved quotation marks, FYI. But thank you, Richard, for providing this.
Emmanuel Charpentier on November 18, 2012 12:18 PM at 12:18 pm said:

Since, according to the development mailing list, the Stan team is currently prioritizing the features needed for the next version, I’d like to plea for the ability to sample from discrete distributions. A lot of interesting work in the realm of model choice / model averaging / model pruning needs this.

Also, the availability of some default model fit/predictive ability index would be a nice addition. Since DIC is currently under some criticism (see for example Martyn Plummer’s 2008 paper, among others), it seems that some alternative should be proposed. For example, Watanabe’s WAIC seems an interesting candidate, and he recently (Aug ’12) proposed an implementation from the MCMC output (that I still do not understand yet, but I have had little time to understand it.

Another possible (difficult) goal would be, of course, the marginal likelihood…
- Bob Carpenter on November 18, 2012 4:04 PM at 4:04 pm said:
  
  Discrete sampling is at the top of our own list.
  
  Do you have some specific models you’d like to fit? For things like mixture modeling, it’s possible to just sum out the discrete terms.
  
  It would be very easy, but horribly inefficient, to have Stan do something like slice sampling against the whole log probability function. Because they explicitly represent a directed graphical model, BUGS and JAGS are able to to find the Markov blanket of a discrete variable node, which is what you need to efficiently compute an unnormalized conditional probability function for the variable.
  
  We don’t have that option in Stan, because of its more imperative nature. Stan’s imperative side makes the modeling language more flexible (e.g., allows local variables, branching, direct manipulation of log probability and transforms), but makes it harder to do compile-time inference on the model’s graphical structure. (And also means the statements are order-sensitive, unlike in BUGS and JAGS.)
  
  So what we’re planning to do is let users express the conditional probabilities directly. This will be very flexible and let users do optimizations, but the huge downside is that the model statement will require consistency across the Gibbs/Slice and HMC components. We can check this consistency automatically at random parameter vectors, but we can’t guarantee consistency.
  
  DIC is tricky for two reasons. One, it requires the log likelihood to be split out from the prior. Two, it requires the log likelihood of the posterior mean of the parameters. The first issue can be handled by splitting the model into a prior and likelihood component — BUGS and JAGS can do this automatically given the static graphical model and runtime data/parameter inputs. The second would require some kind of cross-chain analysis where we collect the posterior samples, calculate their mean, then throw them back into the model. We have all the apparatus at hand to do that.
  
  WAIC’s another matter, as it requires the log likelihoods of each data point. That’s even trickier to get out.
  - Emmanuel Charpentier on November 18, 2012 4:29 PM at 4:29 pm said:
    
    Potential uses for discrete sampling : missing data in discrete covariates (quite frequent in “real world” use…), umbrella sampling and other forms of model selection. The first use is my main stumbling point : in my domain, missing data are the de facto standard… Other possible uses : discrete random variables e. g. boolean, discrete states and counts (lots of applcations in ecology and medicine) and posterior predictive checking of such variables. I agree that the latter can be done by post-processing, but a “lazy user’s” solution encourages such checks.
    
    Your comments on the possible implementation of this discrete sampling and of DIC let me think that a “missing link” could be a BUGS (or BUGS-like) to Stan precompiler, able to use the structural information given by the BUGS graph in a form usable by Stan, and possibly post-process the Stan results. A job for rstan (or an rstan-using complementary package) ?

Comments are closed.