Skip to content
Archive of posts filed under the Bayesian Statistics category.

Prior distributions and the Australia principle

There’s an idea in philosophy called the Australia principle—I don’t know the original of this theory but here’s an example that turned up in a google search—that posits that Australia doesn’t exist; instead, they just build the parts that are needed when you visit: a little mock-up of the airport, a cityscape with a model […]

Regularized Prediction and Poststratification (the generalization of Mister P)

This came up in comments recently so I thought I’d clarify the point. Mister P is MRP, multilevel regression and poststratification. The idea goes like this: 1. You want to adjust for differences between sample and population. Let y be your outcome of interest and X be your demographic and geographic variables you’d like to […]

Boston Stan meetup 12 June!

Shane Bussmann writes to announce the next Boston/Camberville Stan users meetup, Tuesday, June 12, 2018, 6:00 PM to 9:00 PM, at Insight Data Science Office, 280 Summer St., Boston: To kick things off for our first meetup in 2018, I [Bussman] will give a talk on rating teams in recreational ultimate frisbee leagues. In this […]

How to reduce Type M errors in exploratory research?

Miao Yu writes: Recently, I found this piece [a news article by Janet Pelley, Sulfur dioxide pollution tied to degraded sperm quality, published in Chemical & Engineering News] and the original paper [Inverse Association between Ambient Sulfur Dioxide Exposure and Semen Quality in Wuhan, China, by Yuewei Liu, published in Environmental Science & Technology]. Air […]

Aki’s favorite scientific books (so far)

A month ago I (Aki) started a series of tweets about “scientific books which have had big influence on me…”. They are partially in time order, but I can’t remember the exact order. I may have forgotten some, and some stretched the original idea, but I can recommend all of them. I have collected all […]

Zero-excluding priors are probably a bad idea for hierarchical variance parameters

(This is Dan, but in quick mode) I was on the subway when I saw Andrew’s last post and it doesn’t strike me as a particularly great idea. So let’s take a look at the suggestion for 8 schools using a centred parameterization.  This is not as comprehensive as doing a proper simulation study, but […]

How about zero-excluding priors for hierarchical variance parameters to improve computation for full Bayesian inference?

So. For awhile now we’ve moved away from the uniform (or, worse, inverse-gamma!) prior distributions for hierarchical variance parameters. We’ve done half-Cauchy, folded t, and other options; now we’re favoring unit half-normal. We also have boundary-avoiding priors for point estimates, so that in 8-schools-type problems, the posterior mode won’t be zero. Something like the gamma(2) […]

“We continuously increased the number of animals until statistical significance was reached to support our conclusions” . . . I think this is not so bad, actually!

Jordan Anaya pointed me to this post, in which Casper Albers shared this snippet from a recently-published paper from an article in Nature Communications: The subsequent twitter discussion is all about “false discovery rate” and statistical significance, which I think completely misses the point. The problems Before I get to why I think the quoted […]

A model for scientific research programmes that include both “exploratory phenomenon-driven research” and “theory-testing science”

John Christie points us to an article by Klaus Fiedler, What Constitutes Strong Psychological Science? The (Neglected) Role of Diagnosticity and A Priori Theorizing, which begins: A Bayesian perspective on Ioannidis’s (2005) memorable statement that “Most Published Research Findings Are False” suggests a seemingly inescapable trade-off: It appears as if research hypotheses are based either […]

The current state of the Stan ecosystem in R

(This post is by Jonah) Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of […]

Individual and aggregate causal effects: Social media and depression among teenagers

This one starts out as a simple story of correction of a statistical analysis and turns into an interesting discussion of causal inference for multilevel models. Michael Daly writes: I saw your piece on ‘Have Smartphone Destroyed a Generation’ and wanted to flag some of the associations underlying key claims in this debate (which is […]

Postdoc opportunity at AstraZeneca in Cambridge, England, in Bayesian Machine Learning using Stan!

Here it is: Predicting drug toxicity with Bayesian machine learning models We’re currently looking for talented scientists to join our innovative academic-style Postdoc. From our centre in Cambridge, UK you’ll be in a global pharmaceutical environment, contributing to live projects right from the start. You’ll take part in a comprehensive training programme, including a focus […]

Psychometrics corner: They want to fit a multilevel model instead of running 37 separate correlation analyses

Anouschka Foltz writes: One of my students has some data, and there is an issue with multiple comparisons. While trying to find out how to best deal with the issue, I came across your article with Martin Lindquist, “Correlations and Multiple Comparisons in Functional Imaging: A Statistical Perspective.” And while my student’s work does not […]

You better check yo self before you wreck yo self

We (Sean Talts, Michael Betancourt, Me, Aki, and Andrew) just uploaded a paper (code available here) that outlines a framework for verifying that an algorithm for computing a posterior distribution has been implemented correctly. It is easy to use, straightforward to implement, and ready to be implemented as part of a Bayesian workflow. This type of […]

Using partial pooling when preparing data for machine learning applications

Geoffrey Simmons writes: I reached out to John Mount/Nina Zumel over at Win Vector with a suggestion for their vtreat package, which automates many common challenges in preparing data for machine learning applications. The default behavior for impact coding high-cardinality variables had been a naive bayes approach, which I found to be problematic due its multi-modal output (assigning […]

loo 2.0 is loose

This post is by Jonah and Aki. We’re happy to announce the release of v2.0.0 of the loo R package for efficient approximate leave-one-out cross-validation (and more). For anyone unfamiliar with the package, the original motivation for its development is in our paper: Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation […]

Generable: They’re building software for pharma, with Stan inside.

Daniel Lee writes: We’ve just launched our new website. Generable is where precision medicine meets statistical machine learning. We are building a state-of-the-art platform to make individual, patient-level predictions for safety and efficacy of treatments. We’re able to do this by building Bayesian models with Stan. We currently have pilots with AstraZeneca, Sanofi, and University […]

The Millennium Villages Project: a retrospective, observational, endline evaluation

Shira Mitchell et al. write (preprint version here if that link doesn’t work): The Millennium Villages Project (MVP) was a 10 year, multisector, rural development project, initiated in 2005, operating across ten sites in ten sub-Saharan African countries to achieve the Millennium Development Goals (MDGs). . . . In this endline evaluation of the MVP, […]

Fitting a hierarchical model without losing control

Tim Disher writes: I have been asked to run some regularized regressions on a small N high p situation, which for the primary outcome has lead to more realistic coefficient estimates and better performance on cv (yay!). Rstanarm made this process very easy for me so I am grateful for it. I have now been […]

“The Internal and External Validity of the Regression Discontinuity Design: A Meta-Analysis of 15 Within-Study-Comparisons”

Jag Bhalla points to this post by Alex Tabarrok pointing to this paper, “The Internal and External Validity of the Regression Discontinuity Design: A Meta-Analysis of 15 Within-Study-Comparisons,” by Duncan Chaplin, Thomas Cook, Jelena Zurovac, Jared Coopersmith, Mariel Finucane, Lauren Vollmer, and Rebecca Morris, which reports that regression discontinuity (RD) estimation performed well in these […]