Skip to content
Archive of posts filed under the Bayesian Statistics category.

Interesting epi paper using Stan

Jon Zelner writes: Just thought I’d send along this paper by Justin Lessler et al. Thought it was both clever & useful and a nice ad for using Stan for epidemiological work. Basically, what this paper is about is estimating the true prevalence and case fatality ratio of MERS-CoV [Middle East Respiratory Syndrome Coronavirus Infection] […]

OK, sometimes the concept of “false positive” makes sense.

Paul Alper writes: I know by searching your blog that you hold the position, “I’m negative on the expression ‘false positives.’” Nevertheless, I came across this. In the medical/police/judicial world, false positive is a very serious issue: $2 Cost of a typical roadside drug test kit used by police departments. Namely, is that white powder […]

Discussion on overfitting in cluster analysis

Ben Bolker wrote: It would be fantastic if you could suggest one or two starting points for the idea that/explanation why BIC should naturally fail to identify the number of clusters correctly in the cluster-analysis context. Bob Carpenter elaborated: Ben is finding that using BIC to select number of mixture components is selecting too many […]

Abraham Lincoln and confidence intervals

Our recent discussion with mathematician Russ Lyons on confidence intervals reminded me of a famous logic paradox, in which equality is not as simple as it seems. The classic example goes as follows: Abraham Lincoln is the 16th president of the United States, but this does not mean that one can substitute the two expressions […]

How best to partition data into test and holdout samples?

Bill Harris writes: In “Type M error can explain Weisburd’s Paradox,” you reference Button et al. 2013. While reading that article, I noticed figure 1 and the associated text describing the 50% probability of failing to detect a significant result with a replication of the same size as the original test that was just significant. […]

Deep learning, model checking, AI, the no-homunculus principle, and the unitary nature of consciousness

Bayesian data analysis, as my colleagues and I have formulated it, has a human in the loop. Here’s how we put it on the very first page of our book: The process of Bayesian data analysis can be idealized by dividing it into the following three steps: 1. Setting up a full probability model—a joint […]

Stan Case Studies: A good way to jump in to the language

Wanna learn Stan? Everybody’s talking bout it. Here’s a way to jump in: Stan Case Studies. Find one you like and try it out. P.S. I blogged this last month but it’s so great I’m blogging it again. For this post, the target audience is not already-users of Stan but new users.

More on my paper with John Carlin on Type M and Type S errors

Deborah Mayo asked me some questions about that paper (“Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors”), and here’s how I responded: I am not happy with the concepts of “power,” “type 1 error,” and “type 2 error,” because all these are defined in terms of statistical significance, which I am […]

Election forecasting updating error: We ignored correlations in some of our data, thus producing illusory precision in our inferences

The election outcome is a surprise in that it contradicts two pieces of information: Pre-election polls and early-voting tallies. We knew that each of these indicators could be flawed (polls because of differential nonresponse; early-voting tallies because of extrapolation errors), but when the two pieces of evidence came to the same conclusion, they gave us […]

What if NC is a tie and FL is a close win for Clinton?

On the TV they said that they were guessing that Clinton would win Florida in a close race and that North Carolina was too close to call. Let’s run the numbers, Kremp: > update_prob2(clinton_normal=list(“NC”=c(50,2), “FL”=c(52,2))) Pr(Clinton wins the electoral college) = 95% That’s good news for Clinton. What if both states are tied? > update_prob2(clinton_normal=list(“NC”=c(50,2), […]

Election updating software update

When going through the Pierre-Antoine Kremp’s election forecasting updater program, we saw that it ran into difficulties when we started to supply information from lots of states. It was a problem with the program’s rejection sampling algorithm. Kremp updated the program to allow an option where you could specify the winner in each state, and […]

Now that 7pm has come, what do we know?

(followup to this post) On TV they said that Trump won Kentucky and Indiana (no surprise), Clinton won Vermont (really no surprise), but South Carolina, Georgia, and Virginia were too close to call. I’ll run Pierre-Antoine Kremp’s program conditioning on this information, coding states that are “too close to call” as being somewhere between 45% […]

What might we know at 7pm?

To update our effort from 2008, let’s see what we might know when the first polls close. At 7pm, the polls will be closed in the following states: KY, GA, IN, NH, SC, VT, VA. Let’s list these in order of projected Trump/Clinton vote share: KY, IN, SC, GA, NH, VA, VT. I’ll use Kremp’s […]

Updating the Forecast on Election Night with R

Pierre-Antoine Kremp made this cool widget that takes his open-source election forecaster (it aggregates state and national polls using a Stan program that runs from R) and computes conditional probabilities. Here’s the starting point, based on the pre-election polls and forecast information: These results come from the fitted Stan model which gives simulations representing a […]

What is the chance that your vote will decide the election? Ask Stan!

I was impressed by Pierre-Antoine Kremp’s open-source poll aggregator and election forecaster (all in R and Stan with an automatic data feed!) so I wrote to Kremp: I was thinking it could be fun to compute probability of decisive vote by state, as in this paper. This can be done with some not difficult but […]

Different election forecasts not so different

Yeah, I know, I need to work some on the clickbait titles . . . Anyway, people keep asking me why different election forecasts are so different. At the time of this writing, Nate Silver gives Clinton a 66.2% [ugh! See Pedants Corner below] chance of winning the election while Drew Linzer, for example, gives […]

Why I prefer 50% rather than 95% intervals

I prefer 50% to 95% intervals for 3 reasons: 1. Computational stability, 2. More intuitive evaluation (half the 50% intervals should contain the true value), 3. A sense that in aplications it’s best to get a sense of where the parameters and predicted values will be, not to attempt an unrealistic near-certainty. This came up […]

Modeling statewide presidential election votes through 2028

David Leonhardt of the NYT asked a bunch of different people, including me, which of various Romney-won states in 2012 would be likely to be won by a Democrat in 2020, 2024, or 2028, and which of various Obama-won states would go for a Republican in any of those future years. If I’m going to […]

Michael Betancourt has made NUTS even more awesome and efficient!

In an beautiful new paper, Betancourt writes: The geometric foundations of Hamiltonian Monte Carlo implicitly identify the optimal choice of [tuning] parameters, especially the integration time. I then consider the practical consequences of these principles in both existing algorithms and a new implementation called Exhaustive Hamiltonian Monte Carlo [XMC] before demonstrating the utility of these […]

Some modeling and computational ideas to look into

Can we implement these in Stan? Marginally specified priors for non-parametric Bayesian estimation (by David Kessler, Peter Hoff, and David Dunson): Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of […]