Skip to content
Archive of posts filed under the Statistical computing category.

Wolfram Markdown, also called Computational Essay

I was reading Stephen Wolfram’s blog and came across this post: People are used to producing prose—and sometimes pictures—to express themselves. But in the modern age of computation, something new has become possible that I’d like to call the computational essay. I [Wolfram] have been working on building the technology to support computational essays for […]

Comments on Limitations of Bayesian Leave-One-Out Cross-Validation for Model Selection

There is a recent pre-print Limitations of Bayesian Leave-One-Out Cross-Validation for Model Selection by Quentin Gronau and Eric-Jan Wagenmakers. Wagenmakers asked for comments and so here are my comments. Short version: They report a known limitation of LOO when it’s used in a non-recommended way for model selection. They report that their experiments show that […]

Could you say that again less clearly, please? A general-purpose data garbler for applications requiring confidentiality

Ariel Rokem pointed me to this Python program by Bill Howe, Julia Stoyanovich, Haoyue Ping, Bernease Herman, and Matt Gee that will take your data matrix and produce a new data matrix that has the same size, shape, and general statistical properties but with none of the same actual numbers. The use case is when […]

Zero-excluding priors are probably a bad idea for hierarchical variance parameters

(This is Dan, but in quick mode) I was on the subway when I saw Andrew’s last post and it doesn’t strike me as a particularly great idea. So let’s take a look at the suggestion for 8 schools using a centred parameterization.  This is not as comprehensive as doing a proper simulation study, but […]

How about zero-excluding priors for hierarchical variance parameters to improve computation for full Bayesian inference?

So. For awhile now we’ve moved away from the uniform (or, worse, inverse-gamma!) prior distributions for hierarchical variance parameters. We’ve done half-Cauchy, folded t, and other options; now we’re favoring unit half-normal. We also have boundary-avoiding priors for point estimates, so that in 8-schools-type problems, the posterior mode won’t be zero. Something like the gamma(2) […]

The current state of the Stan ecosystem in R

(This post is by Jonah) Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of […]

Postdoc opportunity at AstraZeneca in Cambridge, England, in Bayesian Machine Learning using Stan!

Here it is: Predicting drug toxicity with Bayesian machine learning models We’re currently looking for talented scientists to join our innovative academic-style Postdoc. From our centre in Cambridge, UK you’ll be in a global pharmaceutical environment, contributing to live projects right from the start. You’ll take part in a comprehensive training programme, including a focus […]

You better check yo self before you wreck yo self

We (Sean Talts, Michael Betancourt, Me, Aki, and Andrew) just uploaded a paper (code available here) that outlines a framework for verifying that an algorithm for computing a posterior distribution has been implemented correctly. It is easy to use, straightforward to implement, and ready to be implemented as part of a Bayesian workflow. This type of […]

loo 2.0 is loose

This post is by Jonah and Aki. We’re happy to announce the release of v2.0.0 of the loo R package for efficient approximate leave-one-out cross-validation (and more). For anyone unfamiliar with the package, the original motivation for its development is in our paper: Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation […]

Generable: They’re building software for pharma, with Stan inside.

Daniel Lee writes: We’ve just launched our new website. Generable is where precision medicine meets statistical machine learning. We are building a state-of-the-art platform to make individual, patient-level predictions for safety and efficacy of treatments. We’re able to do this by building Bayesian models with Stan. We currently have pilots with AstraZeneca, Sanofi, and University […]

Fitting a hierarchical model without losing control

Tim Disher writes: I have been asked to run some regularized regressions on a small N high p situation, which for the primary outcome has lead to more realistic coefficient estimates and better performance on cv (yay!). Rstanarm made this process very easy for me so I am grateful for it. I have now been […]

Learn by experimenting!

A students wrote in one of his homework assignments: Sidenote: I know some people say you’re not supposed to use loops in R, but I’ve never been totally sure why this is (a speed thing?). My first computer language was Java, so my inclination is to think in loops before using some of the other […]

Bayesian inference for A/B testing: Lauren Kennedy and I speak at the NYC Women in Machine Learning and Data Science meetup tomorrow (Tues 27 Mar) 7pm

Here it is: Bayesian inference for A/B testing Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University Lauren Kennedy, Columbia Population Research Center, Columbia University Suppose we want to use empirical data to compare two or more decisions or treatment options. Classical statistical methods based on statistical significance and p-values break down […]

Lessons learned in Hell

This post is by Phil. It is not by Andrew. I’m halfway through my third year as a consultant, after 25 years at a government research lab, and I just had a miserable five weeks finishing a project. The end product was fine — actually really good — but the process was horrible and I […]

Bob likes the big audience

In response to a colleague who was a bit scared of posting some work up on the internet for all to see, Bob Carpenter writes: I like the big audience for two reasons related to computer science principles. The first benefit is the same reason it’s scary. The big audience is likely to find flaws. […]

Eid ma clack shaw zupoven del ba.

When I say “I love you”, you look accordingly skeptical – Frida Hyvönen A few years back, Bill Callahan wrote a song about the night he dreamt the perfect song. In a fever, he woke and wrote it down before going back to sleep. The next morning, as he struggled to read his handwriting, he saw […]

Andrew vs. the Multi-Armed Bandit

Andrew and I were talking about coding up some sequential designs for A/B testing in Stan the other week. I volunteered to do the legwork and implement some examples. The literature is very accessible these days—it can be found under the subject heading “multi-armed bandits.” There’s even a Wikipedia page on multi-armed bandits that lays […]

Postdoc opening on subgroup analysis and risk-benefit analysis at Merck pharmaceuticals research lab

Richard Baumgartner writes: We are looking for a strong postdoctoral fellow for a very interesting cutting edge project. The project requires expertise in statistical modeling and machine learning. Here is the official job ad. We are looking for candidates that are strong both analytically and computationally (excellent coding skills). In the project, we are interested […]

How smartly.io productized Bayesian revenue estimation with Stan

Markus Ojala writes: Bayesian modeling is becoming mainstream in many application areas. Applying it needs still a lot of knowledge about distributions and modeling techniques but the recent development in probabilistic programming languages have made it much more tractable. Stan is a promising language that suits single analysis cases well. With the improvements in approximation […]

We were measuring the speed of Stan incorrectly—it’s faster than we thought in some cases due to antithetical sampling

Aki points out that in cases of antithetical sampling, our effective sample size calculations were unduly truncated above at the number of iterations. It turns out the effective sample size can be greater than the number of iterations if the draws are anticorrelated. And all we really care about for speed is effective sample size […]