OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. So I programmed up the algorithm and, no surprise, it didn’t work. I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. But the algorithm still wasn’t doing what it was supposed to do.

So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. It was an example with two coefficients—I’d originally chosen to work in two dimensions so as to be able to easily graph the progression of the inferences, as we did with the EP example in BDA3—so it was easy enough to compare the mean vector and the 2×2 covariance matrix to the theoretical values.

The means looked fine but the covariance matrix from the Stan simulations was off. The correlations were wrong—not by a lot, but by a nonzero amount, I think the value from the formula was 0.30 and from the Stan simulations was 0.32 or something like that—and the variances in Stan were slightly too low.

So then I thought of all sorts of possibilities. First was that maybe I have the formula wrong. Which I did, actually, but fixing the formula didn’t solve the problem. I also tried direct simulation, and that gave the right answer too. Maybe Stan was having some problem with the regression? So I just fed Stan the posterior distribution directly. Still fail. I simplified further, forget regression entirely, just give independent normal priors:

parameters {

real b1;

real b2;

}

model {

b1 ~ normal (0, 1);

b2 ~ normal (0, 1);

}

You can’t get much more stripped down than that.

Still had the problem:

> library ("rstan")

> gaussian2 <- stan (file="gaussian2.stan", iter=20000, chains=4)

> sims <- extract(gaussian2)$b1

> print (mean (sims))

[1] -0.004989011

> print (var (sims))

[1] 0.9606021

That last bit should be 1.00, not 0.96.

I then tried to go even simpler, to one dimension:

parameters {

real b1;

}

model {

b1 ~ normal (0, 1);

}

This time it gave the right answer:

> gaussian1 <- stan (file="gaussian1.stan", iter=20000, chains=4)

> sims <- extract(gaussian1)$b1

> print (mean (sims))

[1] 0.006627426

> print (var (sims))

[1] 0.9961059

OK, finally I’m at the line-crossing in the image shown at the top of this blog post. I have two adjacent examples, one where Stan works and one where it doesn’t. I send it to the Stan team and there is indeed a bug, something to do with a step in the recent implementation which apparently had already been fixed for the forthcoming next version.

**P.S.** Just to be clear: the above is not meant to represent exemplary practice. In retrospect, and even prospectively, I should’ve started simpler. It seems to be a general rule of programming, and of research, that no matter how simple we start, we should start even simpler, to get that solid rock of computational certainty on which we can stand while building our complex structures.

P.P.S. Yes, I do indent my code. The indents just don’t show up when I use the “code” tag in html.

This reminds me — perhaps obliquely — of my friend Dan’s axiom, which he uses to justify writing unit tests for everything: “There is no bit of code so small that I can’t mess [ed. note: the verb here is somewhat stronger than "mess"] it up.”

You probably want to use the pre tag instead of the code tag, so the indentation and everything comes out correctly. It also looks nicer with your blog stylesheet.

Though I’m sure you know this, in addition to printing out intermediate results, you can also use ‘debug’ to step though a function, and ‘browser’ to inspect the state of a computation at a break point. Those are my staples…

On the other hand, sometimes starting complex ends up working, in which case you can make magically fast progress.

This is a Stan bug, yeah? Unless you just got very unlucky when you tried the example with two independent normals. The distribution you get from b1 ~ normal(0,1) should be the same whether or not you also sample b2 ~ normal(0,1).

Phil:

Yes, it was a bug that’s been fixed already for the current version of Stan.

I once heard a story (lets call it an urban legend) about someone at a Stata conference complaining about a bug in some estimator, only for the panelist to reply along the lines of “I’ve only heard of that happening with the pirated Chinese knock-off versions.” Perhaps Nick Cox will drop in and validate/mythbust – I prefer validate, but it is generally known here that I prefer comedy to truth.

Running stolen Stan Andrew? Since I’m not a Bayesian, I can’t really apply any priors here, so that whole “Stan is free” thing doesn’t enter into my inference strategy, and I’m left with N=1, and a point estimate of probability 1 that you are using stolen software.

I usually start my simulations trying to be complicated, realize I haven’t worked out all the details of the complicated DGP yet, and have to simplify just to get to something I understand. Then I try to get that to run. Then I try to improve the DGP and think of all the different parts I can move around that are pertinent. Then I try to to add flexibility (say, by changing parameter values to macros, and writing a “macro define” program where I can set all my specs or sets of specs). Its not a particularly scientific approach to code writing, but I quickly learn what I don’t know yet (unknown unknowns that become known unknowns and hopefully eventually known knowns. I hate you, Rummy).

The Stata story is amusing, but it doesn’t ring a bell and, more importantly, it doesn’t ring true either. Most estimation code is not part of the Stata executable. I will say no more because if you are interested in this that should be sufficient explanation.

Thanks Nick. It struck me as probably untrue as well (hence the “urban legend” part). And I should’ve known that, because the other day I was digging through xtreg_fe.ado to look for “nonest” (THAT Stata legend turned out to be true!), and xtreg_fe.ado is clearly not part of the executable (if I know what you mean by executable – I’m not super computer literate).

The executable is here the main Stata program, which is compiled code. You couldn’t possibly read it. The main problem with pirating Stata would be (sorry, I have to run).

I know you’re not a very strict Bayesian…but still, it’s striking that this doesn’t seem (to me) to be a very Bayesian approach to debugging your code. The emphasis here appears to be on a sequence of tests, of various parts of the code, that have a good chance of identifying errors. This helps you learn, obviously, but the proper choice of reliable tests seems more important than assigning numbers to your current beliefs.

Hjk:

I don’t know what you mean by “assigning numbers to your current beliefs,” but I completely agree that my debugging could be done more systematically. Rather than saying that my debugging should be more Bayesian in my debugging, I think it would make more sense to say that it should be more statistical.

Sorry, I didn’t mean that you should be more Bayesian – I actually meant that your approach seemed to be a better illustration of realistic problem solving/science than Bayesian reasoning is – ie if one was to further systematize the general approach you took, then maybe emphasising choosing a selection of good tests would be better/more natural than trying to fit it into a Bayesian framework of updating beliefs.

*numerically updating beliefs

The purpose of Bayesian inference is to represent what constitutes reasonable beliefs about a model based quantities that are unknown.

Debugging a computer program is about figuring out how your model for a precisely knowable but complex system differs from its actual reality. In other words, Debugging is much more connected to model checking than to inference from imprecise data within a model context.

I guess my point was that it appears that typical model checking-like activities are outside the strict Bayesian framework – which I believe is consistent with Andrew’s view (?) – and that model checking/debugging through a collection of tests is actually quite a good way to problem solve/do science. Not really an issue for pluralists, but seems to undermine the stronger claims by the more dogmatic Bayesians.

I absolutely agree with your wise advise : “It seems to be a general rule of programming, and of research, that no matter how simple we start, we should start even simpler, to get that solid rock of computational certainty on which we can stand while building our complex structures.”

I would generalize this to “wherever you started was the wrong place”, if the code turns out to be buggy.