## Michael found the bug in Stan’s new sampler

Gotcha!

Michael found the bug!

That was a lot of effort, during which time he produced ten pages of dense LaTeX to help Daniel and me understand the algorithm enough to help debug (we’re trying to write a bunch of these algorithmic details up for a more general audience, so stay tuned).

So what was the issue?

In Michael’s own words:

There were actually two bugs. The first is that the right subtree needs it’s own rho in order to compute the correct termination criterion. The second is that in order to compute the termination criterion you need the points on the left and right of each subtree (the orientation of left and right relative to forwards and backwards depends on in which direction you’re trying to extend the trajectory). That means you have to do one leapfrog step and take that  point as left, then do the rest of the leapfrog steps and take the final point as right. But right now I’m taking the initial point as left, which is one off. A small difference (especially as the step size is decreased!) but enough to bias the samples.

I redacted the saltier language (sorry if that destroyed the flavor of the message, Michael [pun intended; this whole bug hunt has left me a bit punchy]).

I responded:

That is a small difference—amazing it has that much effect on sampling. These things are obviously balanced on a knife edge.

Michael then replied:

Well the effect is pretty small and is significant only when you need extreme precision, so it’s not entirely surprising [that our tests didn’t catch it] in hindsight. The source of the problem also explains why the bias went down as the step size was decreased. It also gives a lot of confidence in the general validity of previous results.

I’m just glad all that math was correct!

Whew. Me, too. Especially since the new approch seems both more efficient and more robust.

What do you mean by “new approach”?

Michael replaced the original NUTS algorithm’s slice sampler with a discrete sampler, which trickles through a bunch of the algorithmic steps, such as whether to jump to the latest subtree being built. We’ve (by which I mean Michael) have also been making incremental changes to the adaptation. These started early on when we broke adaptation down into a step size and a regularized mass matrix estimate and then allowed dense mass matrices.

When will Stan be fixed?

It’ll take a few days for us to organize the new code and then a few more days to push it through the interfaces. Definitely in time for StanCon (100+ registrants and counting, with plenty of submitted case studies).

### 26 Comments

1. Kyle C says:

“to help Daniel and I” … God it has spread to our professoriate.

• Martha (Smith) says:

Yeah, I shook my head on that one, too, but put it down to my being a geezer.

• Paul Alper says:

Wow! I thought Andrew’s blog is supposed to be devoid of pretentious social climbing such as “between you and I.” At least later on this did NOT appear:

“It’ll take a few days for we to organize the new code”

• What, me worry about nominative vs. accusative case? (Obligatory old geezer reference to Mad  magazine.)

For those following along, the rule in English is that when you have a coordinate structure like “Daniel and I”, then you drop the “Daniel and” and see which pronoun you’d use. Definitely not “I”—one would never say “Michael helped I understand …”. My bad!

More seriously, let me play amateur psycholinguist. These control verbs can play tricks on real-time writing. The object of the verb “help” acts like the subject of the verb phrase headed by “understand”. You can contrast the control verbs with the raising verbs, which take expletive subjects, e.g., “It seems I can speak English”. A native speaker would never say “It seems me can speak English”.

And pardon my British punctuation of quotes. As a UK-trained linguist, I just can’t abide by American typesetting standards that put sentential punctuation within quotations.

As always, there’s a relevant xkcd on the old fart thing in language: https://xkcd.com/1414/

P.S. I fixed the post.

• Joe says:

I know this is not Language Log, but there is an ambiguity with the word “rule” here that I think should be clarified. On the one hand, we can say that there are rules that are propagated by copy editors and old farts. So in that, I agree with you about the rule about case assignment of pronouns in coordinate structures. But if you use the word “rule” to mean structural regularities governing the formation of phrases, clauses, etc., I wouldn’t say that you can determine the rule for the case assignment of pronouns in a coordinate construction by changing the grammatical structure to a non-coordinate construction (who is to say that English doesn’t have separate rules for coordinate and non-coordinate constructions?). You can find violations of the “rule” throughout the history of English, and I think the regularity is such that they aren’t just processing errors or hypercorrection (that is, a habit beat into students who were repeatedly corrected whenever they said “Me and John went to movies”. Here’s a well-known example from Shakespeare’s Merchant of Venice:

All debts are cleared between you and I.

Anyways, this is an aside to an aside (and it’s one that I’m sure you already know), so I hope it is excused that I just let the non-linguists about the issue.

• Kyle C says:

Sure, we could all agree not to consider it a mistake, but in 2016 there is no such agreement, so it’s a mistake — like wearing flip-flops with office attire: people do it, and one can’t prove objectively that it’s wrong, because there is no timeless or external standard, but it’s gauche and jarring and should be discouraged among the young.

• Shravan says:

Since Bob brought up amateur psycholinguist, here are some cool examples that are fun to consider (Bob is not allowed to answer because he’s a trained linguist):

1. Read the following sentence only once, at your usual reading speed:

The key to the cabinets are on the table.

Does this sentence sound fine to you?

2. Read the two sentences below, and decide which one sounds more grammatical:

A. The senator who the lobbyist that the policeman arrested yesterday confessed to the crime.

B. The senator who the lobbyist that the policeman arrested yesterday bribed confessed to the crime.

For bonus points, say who confessed in A.

3. Read this sentence as many times as you like:

No head injury is too trivial to be ignored.

Does this sentence make sense? What does it mean? Take as long as you like.

Have fun! Bruno Nicenboim (and maybe I as well) will talk about issues 1 and 2 at StanCon. See some of you there!

2. Brad Stiritz says:

Andrew, that’s too bad IMHO that you decided to sanitize the language, particularly if it wasn’t expressly requested. I’ve always felt it humanizes people and conveys a deeper sense of reality and emotional verisimilitude to quote people verbatim. If God is in every leaf of every tree, then why not also in every word of every sentence?

• Don’t blame Andrew. He didn’t write the post. As Phil would say, “That post was by Bob.” (There, one for American punctuation, since I wasn’t making a linguistic point.)

• Brad Stiritz says:

Andrew, I’m so sorry to have falsely cast aspersions. I was regretting posting in haste even before I saw Bob’s reply. Lesson learned.

• I swear. A lot. And since I wrote all of the MCMC code in Stan you could argue that vulgarity is in every leaf of every balanced binary tree output by NUTS.

I’ll see myself out now.

3. Richard McElreath says:

Awesome work. I have a couple of Stan projects ready to send to press. So I gather I’ll be able to hold them until mid-January and then finalize the samples with the bug fix in?

Sorry I can’t be at StanCon.

• The fix is in, passing tests, and just waiting to be approved by another one of the devs and then merged. When you can use it depends on what interface you use. Unfortunately, because we just released RStan 2.13.1 to CRAN we probably won’t be able to get an official RStan update until the end of the month. But if you can use CmdStan or the develop version of any of the interfaces installed from source then you’ll be able to finalize very soon.

• Ben Bolker says:

I would guess that the CRAN maintainers would be willing to waive the usual policy:

> Submitting updates should be done responsibly and with respect for the volunteers’ time. Once a package is established (which may take several rounds), “no more than every 1–2 months” seems appropriate.

if you explain the circumstances. (This is clearly written as a guideline rather than an iron rule.) Especially if you religiously follow all the CRAN rules, make sure all tests pass on all platforms, etc..

4. Hernan Bruno says:

You guys are awesome.

5. Simina says:

I did enjoy reading this! Bring down the human language!

6. David Westergaard says:

As a sidenote, does this bug affect the step size of every iteration? i.e., does the sampler take an unnecessary larger number of steps every iteration?

• This bug effects how Hamiltonian Monte Carlo trajectories are built up, but it’s not clear if the length of the trajectories will be biased towards smaller trajectories or longer trajectories. They’re just not quite the right trajectories.

7. Mick Cooney says:

I am looking forward to reading that PDF on the algorithms. It sounds fascinating.

• Not sure when we’d release it or in what form — the rigorous math has limited appeal to the majority of the Stan community so our priority is typical more pedagogical material.

• Craig Mohn says:

If your assessment about the appeal of rigorous math to the community is based primarily on your priors, please refine them with more data. We might be more math-ey than you think. Or perhaps only 2 or 3 of us even try to mentally process an equation, while the rest of our eyes glaze over while we mentally note that you probably proved something that lines up with the verbiage around it. You obviously have a more objective perception of the user community than we do. However, if you are using this forum to subtly elicit our opinions, I will not-so-subtly tell you that I would benefit from and enjoy reading more about the math and the algorithms, even if it is completely unpolished and far from error-free.

• There is plenty of math already available to read! https://arxiv.org/find/stat/1/au:+Betancourt_M/0/1/0/all/0/1
The writeup mentioned in the post is a slightly more statistical but very careful treatment of the new HMC samplers in Stan discussed in https://arxiv.org/abs/1601.00225.

But, yes, while there are a few people who enjoy the math there are many, many more practitioners using Stan who would benefit from better introductory documentation on modeling, diagnostics, and the like. Given that my highest priority is facilitating better science, that introductory documentation is currently the most effective path.

8. Is the current NUTS algorithm available in pseudo-code somewhere? I think this would be very helpful for reference implementations.