Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end
up jumping between modes.
1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us.
2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest way to estimate their relative masses might well be an MCMC algorithm that jumps through them.
3. Finally, pre-processing to find modes is fine, but if pre-processing is so important, it probably needs its own serious algorithm too. I think some work has been done here but I’m not up on the latest.