Skip to content
Archive of posts filed under the Statistical computing category.

“Each computer run would last 1,000-2,000 hours, and, because we didn’t really trust a program that ran so long, we ran it twice, and it verified that the results matched. I’m not sure I ever was present when a run finished.”

Bill Harris writes: Skimming Michael Betancourt’s history of MCMC [discussed yesterday in this space] made me think: my first computer job was as a nighttime computer operator on the old Rice (R1) Computer, where I was one of several students who ran Monte Carlo programs written by (the very good) chemistry prof Dr. Zevi Salsburg […]

How does probabilistic computation differ in physics and statistics?

[image of Schrodinger’s cat, of course] Stan collaborator Michael Betancourt wrote an article, “The Convergence of Markov chain Monte Carlo Methods: From the Metropolis method to Hamiltonian Monte Carlo,” discussing how various ideas of computational probability moved from physics to statistics. Three things I wanted to add to Betancourt’s story: 1. My paper with Rubin […]

R-squared for Bayesian regression models

Ben, Jonah, Imad, and I write: The usual definition of R-squared (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the […]

Burn-in for MCMC, why we prefer the term warm-up

Here’s what we say on p.282 of BDA3: In the simulation literature (including earlier editions of this book), the warm-up period is called burn-in, a term we now avoid because we feel it draws a misleading analogy to industrial processes in which products are stressed in order to reveal defects. We prefer the term ‘warm-up’ […]

Workflow, baby, workflow

Bob Carpenter writes: Here’s what we do and what we recommend everyone else do: 1. code the model as straightforwardly as possible 2. generate fake data 3. make sure the program properly codes the model 4. run the program on real data 5. *If* the model is too slow, optimize *one step at a time* […]

Interactive visualizations of sampling and GP regression

You really don’t want to miss Chi Feng‘s absolutely wonderful interactive demos. (1) Markov chain Monte Carlo sampling I believe this is exactly what Andrew was asking for a few Stan meetings ago: Chi Feng’s Interactive MCMC Sampling Visualizer This tool lets you explore a range of sampling algorithms including random-walk Metropolis, Hamiltonian Monte Carlo, […]

Bin Yu and Karl Kumbier: “Artificial Intelligence and Statistics”

Yu and Kumbier write: Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during generation of data, development of algo- rithms, and evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training […]

How not to compare the speed of Stan to something else

Someone’s wrong on the internet And I have to do something about it. Following on from Dan’s post on Barry Gibb statistical model evaluation, here’s an example inspired by a paper I found on Google Scholar searching for Stan citations. The paper (which there is no point in citing) concluded that JAGS was faster than […]

Computational and statistical issues with uniform interval priors

There are two anti-patterns* for prior specification in Stan programs that can be sourced directly to idioms developed for BUGS. One is the diffuse gamma priors that Andrew’s already written about at length. The second is interval-based priors. Which brings us to today’s post. Interval priors An interval prior is something like this in Stan […]

Using output from a fitted machine learning algorithm as a predictor in a statistical model

Fred Gruber writes: I attended your talk at Harvard where, regarding the question on how to deal with complex models (trees, neural networks, etc) you mentioned the idea of taking the output of these models and fitting a multilevel regression model. Is there a paper you could refer me to where I can read about […]

Stan is a probabilistic programming language

See here: Stan: A Probabilistic Programming Language. Journal of Statistical Software. (Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell) And here: Stan is Turing Complete. So what? (Bob Carpenter) And, the pre-stan version: Fully Bayesian computing. (Jouni Kerman and Andrew Gelman) Apparently […]

Computing marginal likelihoods in Stan, from Quentin Gronau and E. J. Wagenmakers

Gronau and Wagemakers write: The bridgesampling package facilitates the computation of the marginal likelihood for a wide range of different statistical models. For models implemented in Stan (such that the constants are retained), executing the code bridge_sampler(stanfit) automatically produces an estimate of the marginal likelihood. Full story is at the link.

In the open-source software world, bug reports are welcome. In the science publication world, bug reports are resisted, opposed, buried.

Mark Tuttle writes: If/when the spirit moves you, you should contrast the success of the open software movement with the challenge of published research. In the former case, discovery of bugs, or of better ways of doing things, is almost always WELCOMED. In some cases, submitters of bug reports, patches, suggestions, etc. get “merit badges” […]

The network of models and Bayesian workflow

This is important, it’s been something I’ve been thinking about for decades, it just came up in an email I wrote, and it’s refreshingly unrelated to recent topics of blog discussion, so I decided to just post it right now out of sequence (next slot on the queue is in May 2018). Right now, standard […]

Barry Gibb came fourth in a Barry Gibb look alike contest

Every day a little death, in the parlour, in the bed. In the lips and in the eyes. In the curtains in the silver, in the buttons, in the bread, in the murmurs, in the gestures, in the pauses, in the sighs. – Sondheim The most horrible sound in the world is that of a […]

Workshop on Interpretable Machine Learning

Andrew Gordon Wilson sends along this conference announcement: NIPS 2017 Symposium Interpretable Machine Learning Long Beach, California, USA December 7, 2017 Call for Papers: We invite researchers to submit their recent work on interpretable machine learning from a wide range of approaches, including (1) methods that are designed to be more interpretable from the start, […]

Splines in Stan; Spatial Models in Stan !

Two case studies: Splines in Stan, by Milad Kharratzadeh. Spatial Models in Stan: Intrinsic Auto-Regressive Models for Areal Data, by Mitzi Morris. This is great. Thanks, Mitzi! Thanks, Milad!

Tenure-Track or Tenured Prof. in Machine Learning in Aalto, Finland

This job advertisement for a position in Aalto, Finland, is by Aki We are looking for a professor to either further strengthen our strong research fields, with keywords including statistical machine learning, probabilistic modelling, Bayesian inference, kernel methods, computational statistics, or complementing them with deep learning. Collaboration with other fields is welcome, with local opportunities […]

“5 minutes? Really?”

Bob writes: Daniel says this issue https://github.com/stan-dev/stan/issues/795#issuecomment-26390557117 is an easy 5-minute fix. In my ongoing role as wet blanket, let’s be realistic. It’s sort of like saying it’s an hour from here to Detroit because that’s how long the plane’s in the air. Nothing is a 5 minute fix (door to door) for Stan and […]

“From ‘What If?’ To ‘What Next?’ : Causal Inference and Machine Learning for Intelligent Decision Making”

Panos Toulis writes in to announce this conference: NIPS 2017 Workshop on Causal Inference and Machine Learning (WhatIF2017) “From ‘What If?’ To ‘What Next?’ : Causal Inference and Machine Learning for Intelligent Decision Making” — December 8th 2017, Long Beach, USA. Submission deadline for abstracts and papers: October 31, 2017 Acceptance decisions: November 7, 2017 […]