Samplers for Big Science: emcee and BAT

Over the past few months, we’ve talked about modeling with particle physicists (Allen Caldwell), astrophysicists (David Hogg, who regularly comments here), and climate and energy usage modelers (Phil Price, who regularly posts here).

Big Science Black Boxes

We’ve gotten pretty much the same story from all of them: their models involve “big science” components that are hugely complex and provided by outside implementations from labs like CERN or LBL. Some concrete examples for energy modeling are the TOUGH2 thermal simulator, the EnergyPlus building energy usage simulator, and global climate model (GCM) implementations.

These models have the character of not only being black boxes, but taking several seconds or more to generate the equivalent of a likelihood function evaluation. So we can’t use something like Stan, because nobody has the person years required to implement something like TOUGH2 in Stan (and Stan doesn’t have the debugging or modularity tools to support large-scale development anyway). Even just templating out the code so that we could use automatic differentiation on it would be a huge challenge.

Sampling and Optimization

Not surprisingly, these researchers tend to focus on methods that can be implemented using only black box log probability evaluators.

Phil told us that he and his colleagues at LBL use optimizers that are based on ensemble techniques like the (the Nelder-Mead method). They then try to estimate uncertainty by using finite difference-based estimates of the Hessian of their answer.

Bayesian Analysis Toolkit (BAT)

Allen Caldwell and his group (Daniel Kollar and Kevin Kröninger are listed as the core developers) are behind the Bayesian Analysis Toolkit (BAT), which is based on Metropolis. BAT requires users to implement a class in C++ for the model, but it can call any kind of external libraries it wants.

emcee, the MCMC Hammer

David Hogg and his group (Daniel Foreman-Mackey seems to be doing the heavy code lifting) are behind emcee, aka “the MCMC Hammer,” which is based on Goodman and Weare’s ensemble sampler, which was motivated by the Nelder-Mead method for optimization. emcee requires users to implement a log probability function in Python, which can then call C or C++ or Fortran on the back end.

We plan to add the Goodman and Weare ensemble method to Stan. We’re still working on how to integrate it into our sampling framework. Ensembles are tricky.

Search Engine Optimization. Not!

Perhaps we should’ve heeded Hadley Wickham‘s advice, which he followed in naming “ggplot2,” which is to pick something with zero existing hits on Google. Instead, we have three hard-to-search-for names, “stan,” “emcee,” and “bat.” Of course, “bugs” takes the cake in this competition.