Ummmm, running Stan, testing out a new method we have that applies EP-like ideas to perform inference with aggregate data—it’s really cool, I’ll post more on it once we’ve tried everything out and have a paper that’s in better shape—anyway, I’m starting with a normal example, a varying-intercept, varying-slope model where the intercepts have population mean 50 and sd 10, and the slopes have population mean -2 and sd 0.5 (for simplicity I’ve set up the model with intercepts and slopes independent), and the data variance is 5. Fit the model in Stan (along with other stuff, the real action here’s in the generated quantities block but that’s a story for another day), here’s what we get:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu_a 49.19 0.01 0.52 48.14 48.85 49.20 49.53 50.20 2000 1
mu_a -2.03 0.00 0.11 -2.23 -2.10 -2.03 -1.96 -1.82 1060 1
sigma_a 2.64 0.02 0.50 1.70 2.31 2.62 2.96 3.73 927 1
sigma_a 0.67 0.00 0.08 0.52 0.61 0.66 0.72 0.85 890 1
sigma_y 4.97 0.00 0.15 4.69 4.86 4.96 5.06 5.27 2000 1
We’re gonna clean up this output—all these quantities are ridiculous, also I’m starting to think we shouldn’t be foregrounding the mean and sd as these can be unstable; median and IQR would be better, maybe—but that’s another story too.
Here’s the point. I looked at the above output and noticed that the sigma_a parameters are off: the sd of the intercept is too low (it’s around 2 and it should be 10) and the sd of the slopes is too high (it’s around 0.6 and it should be 0.5). The correct values aren’t even in the 95% intervals.
OK, it could just be this one bad simulation, so I re-ran the code a few times. Same results. Not exactly, but the parameter for the intercepts was consistently underestimated and the parameter for the slopes was consistently overestimated.
What up? OK, I do have a flat prior on all these hypers, so this must be what’s going on: there’s something about the data where intercepts and slopes trade off, and somehow the flat prior allows inferences to go deep into some zone of parameter space where this is possible.
Interesting, maybe ultimately not too surprising. We do know that flat priors cause problems, and here we are again.
What to do? I’d like something weakly informative, this prior shouldn’t boss the inferences around but it should keep them away from bad places.
Hmmm . . . I like that analogy: the weakly informative prior (or, more generally, model) as a permissive but safe parent who lets the kids run around in the neighborhood but sets up a large potential-energy barrier to keep them away from the freeway.
Anyway, to return to our story . . . I needed to figure out what was going on. So I decided to start with a strong prior focused on the true parameter values. I just hard-coded it into the Stan program, setting normal priors for mu_a and mu_a. But then I realized, no, that’s not right, the problem is with sigma_a and sigma_a. Maybe put in lognormals?
And then it hit me: in my R simulation, I’d used sd rather than variance. Here’s the offending code:
a <- mvrnorm(J, mu_a, diag(sigma_a))
That should've been diag(sigma_a^2). Damn! Going from univariate to multivariate normal, the notation changed.
On the plus side, there was nothing wrong with my Stan code. Here's what happens after I fixed the testing code in R:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu_a 48.17 0.11 1.62 45.08 47.07 48.12 49.23 51.38 211 1.02
mu_a -2.03 0.00 0.10 -2.22 -2.09 -2.02 -1.97 -1.82 1017 1.00
sigma_a 10.98 0.05 1.18 8.95 10.17 10.87 11.68 13.55 496 1.01
sigma_a 0.57 0.00 0.09 0.42 0.51 0.56 0.63 0.75 826 1.00
sigma_y 5.06 0.00 0.15 4.78 4.95 5.05 5.16 5.35 2000 1.00
Fake-data checking. That's what it's all about.
And that's why I get so angry at bottom-feeders like Richard Tol, David Brooks, Mark Hauser, Karl Weick, and the like. Every damn day I'm out here working, making mistakes, and tracking them down. I'm not complaining; I like my job. I like it a lot. But it really is work, it's hard work some time. So to encounter people who just don't seem to care, who just don't give a poop whether the things they say are right or wrong, ooohhhhh, that just burns me up.
There's nothing I hate more than those head-in-the-clouds bastards who feel deep in their bones that they're right. Whether it's an economist fudging his numbers, or a newspaper columnist lying about the price of a meal at Red Lobster, or a primatologist who won't share his videotapes, or a b-school professor who twists his stories to suit his audience---I just can't stand it, and what I really can't stand is that it doesn't even seem to matter to them when people point out their errors. Especially horrible when they're scientists or journalists, people who are paid to home in on the truth and have the public trust to do that.
A standard slam against profs like me is that we live in an ivory tower, and indeed my day-to-day life is far removed from the sort of Mametian reality, that give-and-take of fleshy wants and needs, that we associate with "real life." But, y'know, a true scholar cares about the details. Take care of the pennies and all that.