Hamiltonian Monte Carlo (HMC), as used by Stan, is only defined for continuous parameters. We’d love to be able to do discrete sampling. So I was excited when I saw this:

Yichuan Zhang, Charles Sutton, Amos J Storkey, and Zoubin Ghahramani. 2012. Continuous Relaxations for Discrete Hamiltonian Monte Carlo.

NIPS25.

Abstract:Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems.

The paper applies the “Gaussian integral trick” to “relax” a discrete Markov random field (MRF) distribution to a continuous one by adding auxiliary parameters (their formula 11). From there, the discrete parameters are distributed as an easy-to-compute Bernoulli conditional on the auxiliary parameters (their formula 12).

They provide a simulated example for both standard and “frustrated” Boltzmann machines (a kind of MRF) and also for a natural language example involving a skip-chain conditional random field (CRF).

Stan already supports HMC and calculates all the derivatives automatically, so that part is already done. It would be nice to see how well NUTS works on these problems compared to non-adaptive HMC.

We don’t have time right now, but it would be super awesome if someone could implement this model in Stan and write it up in such a form we could blog about it and include it in the Stan manual (with attribution, of course!). From my quick read, it looks like formula (11) can be used directly to parameterize and define the model and their formula (12) could be implemented in the generated quantities block to generate discrete outputs. I’m sure Zhang and crew would be happy to supply more details of the CRF, but I’d be happy with the simple simulated Boltzmann machine example and even happier with another real example that’s not as complex as a skip-chain CRF.

This paper also discusses the use of Gaussian auxiliary variables to make it easier to sample from binary MRFs like Boltzmann machines. I haven’t looked closely enough to tell if it’s exactly the same approach as the one you linked to (it sounds like it might be much simpler), but it’s probably quite similar.

If someone is interested in implementing this sort of thing, the paper I linked to might be a good starting place.

From Zhang et al:

“The only previous work of which we are aware that uses the Gaussian integral trick for inference in graphical models is Martens and Sutskever. They use the trick to transform an arbitrary MRF into an equivalent restricted Boltzmann machine (RBM), on which they then do block Gibbs sampling. They show that this transformation is useful when each block Gibbs step can be performed in parallel. However, unlike the current work, they do not sum out the discrete variables, so they do not perform a full continuous relaxation.”