Tomas Iesmantas had asked me for advice on a regression problem with 50 parameters, and I’d recommended Hamiltonian Monte Carlo.
A few weeks later he reported back:
After trying several modifications (HMC for all parameters at once, HMC just for first level parameters and Riemman manifold Hamiltonian Monte Carlo method), I finally got it running with HMC just for first level parameters and for others using direct sampling, since conditional distributions turned out to have closed form.
However, even in this case it is quite tricky, since I had to employ mass matrix and not just diagonal but at the beginning of algorithm generated it randomly (ensuring it is positive definite). Such random generation of mass matrix is quite blind step, but it proved to be quite helpful.
Riemman manifold HMC is quite vagarious, or to be more specific, metric of manifold is very sensitive. In my model log-likelihood I had exponents and values of metrics matrix elements was very large and when inverting this matrix, algorithm often produced singular matrices.
P.S. I even tried adaptive HMC (Bayesian Adaptive Hamiltonian Monte Carlo with and Application to High-Dimensional BEKK GARCH Models of Martin Burda), but it did not worked. Adaptation seemed strange, since there were no vanishing adaptation, just half of sum of previous and new metrics.
I passed this on to Bob and Matt3. Bob asked:
How did HMC for all the parameters work compared to using HMC for low-level ones and direct sampling for others? Radford Neal discusses this approach in his MCMC handbook chapter on HMC, but we were hoping (backed up by some back of the envelope calculations) that we could just do all the parameters at once.
To which Tomas replied:
When applying HMC for all parameter set I never got good mixing. Sometimes first level parameters showed “wish to mix”, but second level parameters almost didn’t move at all, and usually all parameters were stuck on some values and didn’t move at all. If I reduced leapfrog intergration step more, I just obtained very correlated chains and no matter how I tried to vary intergration step, number of steps, or mass matrix, second level parameters showed (almost) no will to mix.
As for HMC just for first level parameters, everything worked better
than HMC for all parameters at once.
It seemes that second level parameters gave some very odd topology. And for Riemman manifold HMC Fisher information metric wasn’t the right one in my case.
And Matt3 wrote:
Interesting. So there are no negative elements in the mass matrix? I think this should help if there are more positive correlations in your posterior than negative ones, but hurt if there are many negative correlations. I wonder if there’s something about your model that makes positive correlations more common than negative ones.