I (Bob, not Andrew) doubt anyone sets out to do algebra for the fun of it, implement an inefficient algorithm, or write a paper where it’s not clear what the model is. But…
Why not write it in BUGS or Stan?
Over on the Stan users group, Robert Grant wrote
Hello everybody, I’ve just been looking at this new paper [ed. paywalled link, didn't read] which proposes yet another slightly novel model for longitudinal + time-to-event data, and wondering why they bothered with extensive first-principles exposition of the likelihoods and then writing a one-off sampler in R. (Of course, the answer probably is a combination of the authors’ mathematical expertise and the journal’s fondness for algebraic exposition.) Why not just write it in a graphical model style like Stan? They report getting 2000 iterations every 24 hours, and I find it hard to believe this is a sensible way to do it. Is there any reason why this sort of model can’t be done in Stan?
It’s not iterations, but time to convergence and then time per effective sample after that that matters.
Stan’s not just a graphical modeling language — you can write down any expression that’s equal to the log posterior up to an additive constant.
My general experience is that it’s usually easy to fit models that have continuous parameters in Stan. We’ve fit all kinds of models that people have written complicated software for. Most of those models could’ve been fit in BUGS or JAGS, too, which would be easier than writing custom code.
… I feel that journal readers are poorly served by pages of algebra and no straightforward BUGS/Stan code. With good checking, it should be ok to present a new model in this way. After all, I don’t have to write down the likelihood and Newton-Raphson every time I publish
a logistic regression.
I am always struck by this same issue. Here’s what I think is going on:
1. What goes in a paper is up to the author. If the author struggled with a step or found it a bit tricky to think about themselves, then the struggle goes into the paper. Even if it might be obvious to someone with more experience in a field. I was just reading a paper with a very detailed exposition of EM for a latent logistic regression problem with conditional probability derivations, etc. (JMLR paper Learning from Crowds by Raykar et al., which is an awesome paper, even if it suffers from this flaw and number 3.)
2. What goes in a paper is up to editors. If the editors don’t understand something, they’ll ask for details, even if they should be obvious to the entire field. This is agreeing with Robert’s point, I think. Editors like to see the author sweat, because of some kind of no-pain, no-gain esthetic that seems to permeate academic journal publishing. It’s so hard to boil something down, then when you do, you get dinged for it.
I find lots of exposition of basic notions in applied papers in natural language processing, where almost every paper that uses a notion like entropy (including one I just wrote) redefines it from scratch. For that same paper, the editors wanted to know about algorithms for fitting a latent logistic regression. For some journals, you have to insert p-values even if they don’t even make sense from a frequentist perspective.
3. Authors tend to narratives over crisp expositions. They want to tell you their story of how they got to the conclusions. (Even me, hence the preface of the Stan manual.) One of the things that frustrates me to no end in papers, particularly in statistics, is that the author tells you a narrative of how they meandered to the model they finally use; in computer science, they meander in the same way to models or algorithm variants. What I want to see is a crisp statement of the model with enough details that I can replicate it. Providing working code is the easiest way to do that. It’s why I liked Jennifer and Andrew’s regression book so much — they gave you (mostly) runnable code.
The way forward
I think what applied papers need to do is lay out their model crisply. BUGS is a nice language for this if your model is short and easily expressed as a directed acyclic graph. I think the variable declarations in Stan make it even clearer what’s going on, especially as models go beyond a few lines.
The separation of data and parameters in Stan makes it evident how the model is intended to be used for inference — that information is not part of the model specification in BUGS. But this is also a drawback to Stan’s language in that the distinction between data and parameters is not part of a Bayesian model per se, just a statement of how you’re going to use it. One upshot of this is that missing data models in Stan are awkward, to say the least. (Hopefully the addition of functions should clean this up from a Stan user’s perspective, but it’s still not going to provide an easy-to-follow model spec like many models in BUGS. A possibility we’ve been considering is adding a graphical model specification language that could be combined with data to compile a Stan model — the issue is that we need the graphical model plus the identity of variables provided as data in order to compile and then run a Stan model.)
Goldilocks and the Three Academics
For some reason, mathematicians seem to be immune to problems (1) and (3). In fact, I feel they often go too far, which is why this is such a Goldilocks issue. What’s an inscrutable step or bit of math to one person (contact manifolds anyone?) is like counting on your fingers to a more advanced mathematician (or statistician). You can’t write an intro to differentiable manifolds (or hierarchical regression) into every one of your papers. So you have to write books and software that provide an infrastructure for writing simple papers, but then you get dinged because it’s “too easy” or alternatively because people don’t read the book and it’s “too hard.”
P.S. I finally met Phil of “this post is by Phil” fame! He said that no matter how he qualifies his posts, many commenters assume they’re written by Andrew!