Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function.
Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC!
In response to Xian’s comments, Matt writes:
I wonder about the computing time (and the “unacceptably large amount of memory”, p.12) required by the doubling procedure: 2^j is growing fast with j! (If my intuition is right, the computing time should increase rather quickly with the dimension. And I do not get the argument within the paper that the costly part is the gradient computation: it seems to me the gradient must be computed for all of the 2^j points.)
2j does grow quickly with j, but so does the length of the trajectory, and it’s impossible to run a Hamiltonian trajectory for a seriously long time without making a U turn and stopping the doubling process. (Just like it’s impossible to throw a ball out of an infinitely deep pit with a finite amount of energy.) So j never gets very big. As far as memory goes, the “naive” implementation (algorithm 2) has to store all O(2j) states it visits, but the more sophisticated implementation (algorithm 3) only needs to store O(j) states. Finally, the gradient computations dominate precisely because we must compute 2j of them—NUTS introduces O(2jj) non-gradient overhead, which is usually trivial compared to O(2j) gradient computations.
In summary, if we assume that NUTS generates trajectories that are about the optimal length of the corresponding HMC trajectory (which is hard to prove, but empirically seems not too far off), then NUTS adds a (usually negligible) computational overhead of order O(2jj) and an (acceptable for non-huge problems) memory overhead of order O(j) compared with HMC. These costs scale linearly with dimension, like HMC’s costs.
Trajectory lengths will generally increase faster than linearly with dimension, i.e., j will grow faster than log_2(number_of_dimensions). But the optimal trajectory length for HMC does as well, and in high dimensions HMC is pretty much the best we’ve got. (Unless you can exploit problem structure in some really really clever ways [or for some specific models with lots of independence structure--ed.].)
A final grumble: that the code is “only” available in the proprietary language Matlab!
Apologies to non-Matlab users! There’s also a c++ implementation in Stan, but it’s not (yet) well documented. I guess I [Matt] should also write python and R implementations, although my R expertise doesn’t extend very far beyond what’s needed to use ggplot2. [Matt doesn't need to translate into Python and R--anyone who wants to do that should be able to do so, easily enough. And in any case, Stan will have the fast C++ version. Matt was just using Matlab as a convenient platform for developing Nuts.--ed.]
See also Bob’s comments which he posted directly on Xian’s blog.
Stan’s nearly ready for release, and we’re also working on the paper. You’ll be able to run it from R just like you can run Bugs or Jags. Compared to Bugs and Jags, Stan should be faster (especially for big models) and should be able to fit a wider range of models (for example, varying-intercept, varying-slope multilevel models with multiple coefficients per group). It’s all open-source and others should feel free to steal the good parts for their own use. The work is partly funded by a U.S. Department of Energy grant.