## Jumping off the edge of the world

Tomas Iesmantas writes:

I’m facing a problem where parameter space is bounded, e.g. all parameters have to be positive. If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc.)?

The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. This will work fine (just remember that when you reject, you have to stay at the last value for one more iteration), but if you’re doing these rejections all the time, you might want to reparameterize your space, for example using logs for positive parameters, logits for constrained parameters, and softmax for parameters that are constrained to sum to 1.

1. Jerzy says:

Great advice; thank you for sharing this!
I hadn’t heard of the term “softmax” before, so it’s great to have another term to google when looking for help on such problems.

Christian Robert posted a response too, with example code:
http://xianblog.wordpress.com/2011/07/05/bounded-target-support/

2. Xi'an says:

I wonder what part of my answer to Tomàs was unclear… I even included an R code for illustration!

• Andrew says:

X,

Why do you think your answer was unclear?

3. Sometimes the best thing to do is to transform the troublesome variable. For example, if x cannot be negative, then y=log(x) can have any sign. Often this improves the behavior of the sampler as well, since it is often the case that “bumping up” against the zero floor is accompanied by bad behavior in other ways. For example, if you had a Jeffreys prior 1/x on a scale variable, then in the log transform the prior on y is uniform and the singularity goes away.

4. I read over Andrew’s comment too fast as he also recommended transforms.

Another interesting way out is to reflect the variable at the boundary, e.g., if propose negative x, use |x| instead. This works if the Metropolis-Hastings factor is invariant to the sign of x (in this example).

5. Ted Dunning says:

I would like to put in a vote of recommendation for the idea of transformations as well. The overall long term distribution near an edge may be correct with a hard boundary, but in practice what you get is much slower mixing near the edges than in the center since the acceptance probability can go to near zero if you sample winds up in a corner. Asymptotically, this is fine since the solution sits in one place long enough to make the probability density come out right, but practically this can kill convergence. WIthout the hard edges, you are likely to get lots of little jumps when you wind up in a similar (transformed) point and get a better solution.

The way that I first ran into this was testing a sampler on a simple Dirichlet distribution with alpha = 1. When I plotted the samples, it looked very much like there was not at all a uniform distribution, but after much gnashing of teeth, I realized that the points near the edges really represented dozens or hundreds of samples. Granted, not all practitioners will be as dense as I was that day, but the transformations would have saved me several hours of grief.

6. Iain Murray says:

A reminder for those who don’t regularly reparamaterize probability densities: Densities needs to be corrected with a Jacobian term when changing variables. For example, if originally sampling a positive variable with density:

P(x) \propto f(x)

then to sample z=log(x), your code needs to use the density:

P(z) \propto |dx/dz| f(exp(z)) = exp(z) f(exp(z))

Another option is slice sampling. The bracketing procedures can shrink away from the constraint to an acceptable point. If the constrained region changes in width across the support of the density, then an adaptive step-size can be useful.