Hamiltonian Monte Carlo (HMC), as used by Stan, is only defined for continuous parameters. We’d love to be able to do discrete sampling. So I was excited when I saw this:
Yichuan Zhang, Charles Sutton, Amos J Storkey, and Zoubin Ghahramani. 2012. Continuous Relaxations for Discrete Hamiltonian Monte Carlo. NIPS 25.
Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems.
The paper applies the “Gaussian integral trick” to “relax” a discrete Markov random field (MRF) distribution to a continuous one by adding auxiliary parameters (their formula 11). From there, the discrete parameters are distributed as an easy-to-compute Bernoulli conditional on the auxiliary parameters (their formula 12).
They provide a simulated example for both standard and “frustrated” Boltzmann machines (a kind of MRF) and also for a natural language example involving a skip-chain conditional random field (CRF).
Stan already supports HMC and calculates all the derivatives automatically, so that part is already done. It would be nice to see how well NUTS works on these problems compared to non-adaptive HMC.
We don’t have time right now, but it would be super awesome if someone could implement this model in Stan and write it up in such a form we could blog about it and include it in the Stan manual (with attribution, of course!). From my quick read, it looks like formula (11) can be used directly to parameterize and define the model and their formula (12) could be implemented in the generated quantities block to generate discrete outputs. I’m sure Zhang and crew would be happy to supply more details of the CRF, but I’d be happy with the simple simulated Boltzmann machine example and even happier with another real example that’s not as complex as a skip-chain CRF.