Michael found the bug!
That was a lot of effort, during which time he produced ten pages of dense LaTeX to help Daniel and me understand the algorithm enough to help debug (we’re trying to write a bunch of these algorithmic details up for a more general audience, so stay tuned).
So what was the issue?
In Michael’s own words:
There were actually two bugs. The first is that the right subtree needs it’s own rho in order to compute the correct termination criterion. The second is that in order to compute the termination criterion you need the points on the left and right of each subtree (the orientation of left and right relative to forwards and backwards depends on in which direction you’re trying to extend the trajectory). That means you have to do one leapfrog step and take that point as left, then do the rest of the leapfrog steps and take the final point as right. But right now I’m taking the initial point as left, which is one off. A small difference (especially as the step size is decreased!) but enough to bias the samples.
I redacted the saltier language (sorry if that destroyed the flavor of the message, Michael [pun intended; this whole bug hunt has left me a bit punchy]).
That is a small difference—amazing it has that much effect on sampling. These things are obviously balanced on a knife edge.
Michael then replied:
Well the effect is pretty small and is significant only when you need extreme precision, so it’s not entirely surprising [that our tests didn’t catch it] in hindsight. The source of the problem also explains why the bias went down as the step size was decreased. It also gives a lot of confidence in the general validity of previous results.
I’m just glad all that math was correct!
Whew. Me, too. Especially since the new approch seems both more efficient and more robust.
What do you mean by “new approach”?
Michael replaced the original NUTS algorithm’s slice sampler with a discrete sampler, which trickles through a bunch of the algorithmic steps, such as whether to jump to the latest subtree being built. We’ve (by which I mean Michael) have also been making incremental changes to the adaptation. These started early on when we broke adaptation down into a step size and a regularized mass matrix estimate and then allowed dense mass matrices.
When will Stan be fixed?
It’ll take a few days for us to organize the new code and then a few more days to push it through the interfaces. Definitely in time for StanCon (100+ registrants and counting, with plenty of submitted case studies).