There’s an idea in philosophy called the Australia principle—I don’t know the original of this theory but here’s an example that turned up in a google search—that posits that Australia doesn’t exist; instead, they just build the parts that are needed when you visit: a little mock-up of the airport, a cityscape with a model […]

**Bayesian Statistics**category.

## Regularized Prediction and Poststratification (the generalization of Mister P)

This came up in comments recently so I thought I’d clarify the point. Mister P is MRP, multilevel regression and poststratification. The idea goes like this: 1. You want to adjust for differences between sample and population. Let y be your outcome of interest and X be your demographic and geographic variables you’d like to […]

## Boston Stan meetup 12 June!

Shane Bussmann writes to announce the next Boston/Camberville Stan users meetup, Tuesday, June 12, 2018, 6:00 PM to 9:00 PM, at Insight Data Science Office, 280 Summer St., Boston: To kick things off for our first meetup in 2018, I [Bussman] will give a talk on rating teams in recreational ultimate frisbee leagues. In this […]

## How to reduce Type M errors in exploratory research?

Miao Yu writes: Recently, I found this piece [a news article by Janet Pelley, Sulfur dioxide pollution tied to degraded sperm quality, published in Chemical & Engineering News] and the original paper [Inverse Association between Ambient Sulfur Dioxide Exposure and Semen Quality in Wuhan, China, by Yuewei Liu, published in Environmental Science & Technology]. Air […]

## Aki’s favorite scientific books (so far)

A month ago I (Aki) started a series of tweets about “scientific books which have had big influence on me…”. They are partially in time order, but I can’t remember the exact order. I may have forgotten some, and some stretched the original idea, but I can recommend all of them. I have collected all […]

## Zero-excluding priors are probably a bad idea for hierarchical variance parameters

(This is Dan, but in quick mode) I was on the subway when I saw Andrew’s last post and it doesn’t strike me as a particularly great idea. So let’s take a look at the suggestion for 8 schools using a centred parameterization. This is not as comprehensive as doing a proper simulation study, but […]

## How about zero-excluding priors for hierarchical variance parameters to improve computation for full Bayesian inference?

So. For awhile now we’ve moved away from the uniform (or, worse, inverse-gamma!) prior distributions for hierarchical variance parameters. We’ve done half-Cauchy, folded t, and other options; now we’re favoring unit half-normal. We also have boundary-avoiding priors for point estimates, so that in 8-schools-type problems, the posterior mode won’t be zero. Something like the gamma(2) […]

## “We continuously increased the number of animals until statistical significance was reached to support our conclusions” . . . I think this is not so bad, actually!

Jordan Anaya pointed me to this post, in which Casper Albers shared this snippet from a recently-published paper from an article in Nature Communications: The subsequent twitter discussion is all about “false discovery rate” and statistical significance, which I think completely misses the point. The problems Before I get to why I think the quoted […]

## A model for scientific research programmes that include both “exploratory phenomenon-driven research” and “theory-testing science”

John Christie points us to an article by Klaus Fiedler, What Constitutes Strong Psychological Science? The (Neglected) Role of Diagnosticity and A Priori Theorizing, which begins: A Bayesian perspective on Ioannidis’s (2005) memorable statement that “Most Published Research Findings Are False” suggests a seemingly inescapable trade-off: It appears as if research hypotheses are based either […]

## The current state of the Stan ecosystem in R

(This post is by Jonah) Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of […]

## Individual and aggregate causal effects: Social media and depression among teenagers

This one starts out as a simple story of correction of a statistical analysis and turns into an interesting discussion of causal inference for multilevel models. Michael Daly writes: I saw your piece on ‘Have Smartphone Destroyed a Generation’ and wanted to flag some of the associations underlying key claims in this debate (which is […]

## Postdoc opportunity at AstraZeneca in Cambridge, England, in Bayesian Machine Learning using Stan!

Here it is: Predicting drug toxicity with Bayesian machine learning models We’re currently looking for talented scientists to join our innovative academic-style Postdoc. From our centre in Cambridge, UK you’ll be in a global pharmaceutical environment, contributing to live projects right from the start. You’ll take part in a comprehensive training programme, including a focus […]

## Psychometrics corner: They want to fit a multilevel model instead of running 37 separate correlation analyses

Anouschka Foltz writes: One of my students has some data, and there is an issue with multiple comparisons. While trying to find out how to best deal with the issue, I came across your article with Martin Lindquist, “Correlations and Multiple Comparisons in Functional Imaging: A Statistical Perspective.” And while my student’s work does not […]

## You better check yo self before you wreck yo self

We (Sean Talts, Michael Betancourt, Me, Aki, and Andrew) just uploaded a paper (code available here) that outlines a framework for verifying that an algorithm for computing a posterior distribution has been implemented correctly. It is easy to use, straightforward to implement, and ready to be implemented as part of a Bayesian workflow. This type of […]

## Using partial pooling when preparing data for machine learning applications

Geoffrey Simmons writes: I reached out to John Mount/Nina Zumel over at Win Vector with a suggestion for their vtreat package, which automates many common challenges in preparing data for machine learning applications. The default behavior for impact coding high-cardinality variables had been a naive bayes approach, which I found to be problematic due its multi-modal output (assigning […]

## loo 2.0 is loose

This post is by Jonah and Aki. We’re happy to announce the release of v2.0.0 of the loo R package for efficient approximate leave-one-out cross-validation (and more). For anyone unfamiliar with the package, the original motivation for its development is in our paper: Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation […]

## Generable: They’re building software for pharma, with Stan inside.

Daniel Lee writes: We’ve just launched our new website. Generable is where precision medicine meets statistical machine learning. We are building a state-of-the-art platform to make individual, patient-level predictions for safety and efficacy of treatments. We’re able to do this by building Bayesian models with Stan. We currently have pilots with AstraZeneca, Sanofi, and University […]

## The Millennium Villages Project: a retrospective, observational, endline evaluation

Shira Mitchell et al. write (preprint version here if that link doesn’t work): The Millennium Villages Project (MVP) was a 10 year, multisector, rural development project, initiated in 2005, operating across ten sites in ten sub-Saharan African countries to achieve the Millennium Development Goals (MDGs). . . . In this endline evaluation of the MVP, […]

## Fitting a hierarchical model without losing control

Tim Disher writes: I have been asked to run some regularized regressions on a small N high p situation, which for the primary outcome has lead to more realistic coefficient estimates and better performance on cv (yay!). Rstanarm made this process very easy for me so I am grateful for it. I have now been […]

## “The Internal and External Validity of the Regression Discontinuity Design: A Meta-Analysis of 15 Within-Study-Comparisons”

Jag Bhalla points to this post by Alex Tabarrok pointing to this paper, “The Internal and External Validity of the Regression Discontinuity Design: A Meta-Analysis of 15 Within-Study-Comparisons,” by Duncan Chaplin, Thomas Cook, Jelena Zurovac, Jared Coopersmith, Mariel Finucane, Lauren Vollmer, and Rebecca Morris, which reports that regression discontinuity (RD) estimation performed well in these […]