Skip to content
Archive of posts filed under the Statistical computing category.

Lessons learned in Hell

This post is by Phil. It is not by Andrew. I’m halfway through my third year as a consultant, after 25 years at a government research lab, and I just had a miserable five weeks finishing a project. The end product was fine — actually really good — but the process was horrible and I […]

Bob likes the big audience

In response to a colleague who was a bit scared of posting some work up on the internet for all to see, Bob Carpenter writes: I like the big audience for two reasons related to computer science principles. The first benefit is the same reason it’s scary. The big audience is likely to find flaws. […]

Eid ma clack shaw zupoven del ba.

When I say “I love you”, you look accordingly skeptical – Frida Hyvönen A few years back, Bill Callahan wrote a song about the night he dreamt the perfect song. In a fever, he woke and wrote it down before going back to sleep. The next morning, as he struggled to read his handwriting, he saw […]

Andrew vs. the Multi-Armed Bandit

Andrew and I were talking about coding up some sequential designs for A/B testing in Stan the other week. I volunteered to do the legwork and implement some examples. The literature is very accessible these days—it can be found under the subject heading “multi-armed bandits.” There’s even a Wikipedia page on multi-armed bandits that lays […]

Postdoc opening on subgroup analysis and risk-benefit analysis at Merck pharmaceuticals research lab

Richard Baumgartner writes: We are looking for a strong postdoctoral fellow for a very interesting cutting edge project. The project requires expertise in statistical modeling and machine learning. Here is the official job ad. We are looking for candidates that are strong both analytically and computationally (excellent coding skills). In the project, we are interested […]

How smartly.io productized Bayesian revenue estimation with Stan

Markus Ojala writes: Bayesian modeling is becoming mainstream in many application areas. Applying it needs still a lot of knowledge about distributions and modeling techniques but the recent development in probabilistic programming languages have made it much more tractable. Stan is a promising language that suits single analysis cases well. With the improvements in approximation […]

We were measuring the speed of Stan incorrectly—it’s faster than we thought in some cases due to antithetical sampling

Aki points out that in cases of antithetical sampling, our effective sample size calculations were unduly truncated above at the number of iterations. It turns out the effective sample size can be greater than the number of iterations if the draws are anticorrelated. And all we really care about for speed is effective sample size […]

Static sensitivity analysis: Computing robustness of Bayesian inferences to the choice of hyperparameters

Ryan Giordano wrote: Last year at StanCon we talked about how you can differentiate under the integral to automatically calculate quantitative hyperparameter robustness for Bayesian posteriors. Since then, I’ve packaged the idea up into an R library that plays nice with Stan. You can install it from this github repo. I’m sure you’ll be pretty […]

Three new domain-specific (embedded) languages with a Stan backend

One is an accident. Two is a coincidence. Three is a pattern. Perhaps it’s no coincidence that there are three new interfaces that use Stan’s C++ implementation of adaptive Hamiltonian Monte Carlo (currently an updated version of the no-U-turn sampler). ScalaStan embeds a Stan-like language in Scala. It’s a Scala package largely (if not entirely […]

“Each computer run would last 1,000-2,000 hours, and, because we didn’t really trust a program that ran so long, we ran it twice, and it verified that the results matched. I’m not sure I ever was present when a run finished.”

Bill Harris writes: Skimming Michael Betancourt’s history of MCMC [discussed yesterday in this space] made me think: my first computer job was as a nighttime computer operator on the old Rice (R1) Computer, where I was one of several students who ran Monte Carlo programs written by (the very good) chemistry prof Dr. Zevi Salsburg […]

How does probabilistic computation differ in physics and statistics?

[image of Schrodinger’s cat, of course] Stan collaborator Michael Betancourt wrote an article, “The Convergence of Markov chain Monte Carlo Methods: From the Metropolis method to Hamiltonian Monte Carlo,” discussing how various ideas of computational probability moved from physics to statistics. Three things I wanted to add to Betancourt’s story: 1. My paper with Rubin […]

R-squared for Bayesian regression models

Ben, Jonah, Imad, and I write: The usual definition of R-squared (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the […]

Burn-in for MCMC, why we prefer the term warm-up

Here’s what we say on p.282 of BDA3: In the simulation literature (including earlier editions of this book), the warm-up period is called burn-in, a term we now avoid because we feel it draws a misleading analogy to industrial processes in which products are stressed in order to reveal defects. We prefer the term ‘warm-up’ […]

Workflow, baby, workflow

Bob Carpenter writes: Here’s what we do and what we recommend everyone else do: 1. code the model as straightforwardly as possible 2. generate fake data 3. make sure the program properly codes the model 4. run the program on real data 5. *If* the model is too slow, optimize *one step at a time* […]

Interactive visualizations of sampling and GP regression

You really don’t want to miss Chi Feng‘s absolutely wonderful interactive demos. (1) Markov chain Monte Carlo sampling I believe this is exactly what Andrew was asking for a few Stan meetings ago: Chi Feng’s Interactive MCMC Sampling Visualizer This tool lets you explore a range of sampling algorithms including random-walk Metropolis, Hamiltonian Monte Carlo, […]

Bin Yu and Karl Kumbier: “Artificial Intelligence and Statistics”

Yu and Kumbier write: Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during generation of data, development of algo- rithms, and evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training […]

How not to compare the speed of Stan to something else

Someone’s wrong on the internet And I have to do something about it. Following on from Dan’s post on Barry Gibb statistical model evaluation, here’s an example inspired by a paper I found on Google Scholar searching for Stan citations. The paper (which there is no point in citing) concluded that JAGS was faster than […]

Computational and statistical issues with uniform interval priors

There are two anti-patterns* for prior specification in Stan programs that can be sourced directly to idioms developed for BUGS. One is the diffuse gamma priors that Andrew’s already written about at length. The second is interval-based priors. Which brings us to today’s post. Interval priors An interval prior is something like this in Stan […]

Using output from a fitted machine learning algorithm as a predictor in a statistical model

Fred Gruber writes: I attended your talk at Harvard where, regarding the question on how to deal with complex models (trees, neural networks, etc) you mentioned the idea of taking the output of these models and fitting a multilevel regression model. Is there a paper you could refer me to where I can read about […]

Stan is a probabilistic programming language

See here: Stan: A Probabilistic Programming Language. Journal of Statistical Software. (Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell) And here: Stan is Turing Complete. So what? (Bob Carpenter) And, the pre-stan version: Fully Bayesian computing. (Jouni Kerman and Andrew Gelman) Apparently […]