Skip to content
Archive of posts filed under the Bayesian Statistics category.

Gilovich doubles down on hot hand denial

[cat picture] A correspondent pointed me to this Freaknomics radio interview with Thomas Gilovich, one of the authors of that famous “hot hand” paper from 1985, “Misperception of Chance Processes in Basketball.” Here’s the key bit from the Freakonomics interview: DUBNER: Right. The “hot-hand notion” or maybe the “hot-hand fallacy.” GILOVICH: Well, everyone who’s ever […]

Prediction model for fleet management

Chang writes: I am working on a fleet management system these days: basically, I am trying to predict the usage ‘y’ of our fleet in a zip code in the future. We have some factors ‘X’, such as number of active users, number of active merchants etc. If I can fix the time horizon, the […]

Mortality rate trends by age, ethnicity, sex, and state (link fixed)

There continues to be a lot of discussion on the purported increase in mortality rates among middle-aged white people in America. Actually an increase among women and not much change among men but you don’t hear so much about this as it contradicts the “struggling white men” story that we hear so much about in […]

Some natural solutions to the p-value communication problem—and why they won’t work

Blake McShane and David Gal recently wrote two articles (“Blinding us to the obvious? The effect of statistical training on the evaluation of evidence” and “Statistical significance and the dichotomization of evidence”) on the misunderstandings of p-values that are common even among supposed experts in statistics and applied social research. The key misconception has nothing […]

“Bias” and “variance” are two ways of looking at the same thing. (“Bias” is conditional, “variance” is unconditional.)

Someone asked me about the distinction between bias and noise and I sent him some links. Then I thought this might interest some of you too, so here it is: Here’s a recent paper on election polling where we try to be explicit about what is bias and what is variance: And here are some […]

“A blog post that can help an industry”

Tim Bock writes: I understood how to address weights in statistical tests by reading Lu and Gelman (2003). Thanks. You may be disappointed to know that this knowledge allowed me to write software, which has been used to compute many billions of p-values. When I read your posts and papers on forking paths, I always […]

Ensemble Methods are Doomed to Fail in High Dimensions

Ensemble methods [cat picture] By ensemble methods, I (Bob, not Andrew) mean approaches that scatter points in parameter space and then make moves by inteprolating or extrapolating among subsets of them. Two prominent examples are: Ter Braak’s differential evolution   Goodman and Weare’s walkers There are extensions and computer implementations of these algorithms. For example, […]

Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data

After three years, we finally have an updated version of our “EP as a way of life” paper. Authors are Andrew Gelman, Aki Vehtari, Pasi Jylänki, Tuomas Sivula, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John Cunningham, David Schiminovich, and Christian Robert. Aki deserves credit for putting this all together into a coherent whole. Here’s the […]

A fistful of Stan case studies: divergences and bias, identifying mixtures, and weakly informative priors

Following on from his talk at StanCon, Michael Betancourt just wrote three Stan case studies, all of which are must reads: Diagnosing Biased Inference with Divergences: This case study discusses the subtleties of accurate Markov chain Monte Carlo estimation and how divergences can be used to identify biased estimation in practice.   Identifying Bayesian Mixture […]

How to interpret confidence intervals?

Jason Yamada-Hanff writes: I’m a Neuroscience PhD reforming my statistics education. I am a little confused about how you treat confidence intervals in the book and was hoping you could clear things up for me. Through your blog, I found Richard Morey’s paper (and further readings) about confidence interval interpretations. If I understand correctly, the […]

Yes, it makes sense to do design analysis (“power calculations”) after the data have been collected

This one has come up before but it’s worth a reminder. Stephen Senn is a thoughtful statistician and I generally agree with his advice but I think he was kinda wrong on this one. Wrong in an interesting way. Senn’s article is from 2002 and it is called “Power is indeed irrelevant in interpreting completed […]

Facebook’s Prophet uses Stan

Sean Taylor, a research scientist at Facebook and Stan user, writes: I wanted to tell you about an open source forecasting package we just released called Prophet:  I thought the readers of your blog might be interested in both the package and the fact that we built it on top of Stan. Under the hood, […]

Theoretical statistics is the theory of applied statistics: how to think about what we do (My talk Wednesday—today!—4:15pm at the Harvard statistics dept)

Theoretical statistics is the theory of applied statistics: how to think about what we do Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University Working scientists and engineers commonly feel that philosophy is a waste of time. But theoretical and philosophical principles can guide practice, so it makes sense for us to […]

Is Rigor Contagious? (my talk next Monday 4:15pm at Columbia)

Is Rigor Contagious? Much of the theory and practice of statistics and econometrics is characterized by a toxic mixture of rigor and sloppiness. Methods are justified based on seemingly pure principles that can’t survive reality. Examples of these principles include random sampling, unbiased estimation, hypothesis testing, Bayesian inference, and causal identification. Examples of uncomfortable reality […]

Looking for rigor in all the wrong places (my talk this Thursday in the Columbia economics department)

[cat picture] Looking for Rigor in All the Wrong Places What do the following ideas and practices have in common: unbiased estimation, statistical significance, insistence on random sampling, and avoidance of prior information? All have been embraced as ways of enforcing rigor but all have backfired and led to sloppy analyses and erroneous inferences. We […]

Blind Spot

X pointed me to this news article reporting an increase in death rate among young adults in the United States: Selon une enquête publiée le 26 janvier par la revue scientifique The Lancet, le taux de mortalité des jeunes Américains âgés de 25 à 35 ans a connu une progression entre 1999 et 2014, alors […]

Vine regression?

Jeremy Neufeld writes: I’m an undergraduate student at the University of Maryland and I was recently referred to this paper (Vine Regression, by Roger Cooke, Harry Joe, and Bo Chang), also an accompanying summary blog post by the main author) as potentially useful in policy analysis. With the big claims it makes, I am not […]

Krzysztof Sakrejda speaks in NYC on Bayesian hierarchical survival-type model for Dengue infection

Daniel writes: Krzysztof Sakrejda is giving a cool talk next Tues 5:30-7pm downtown on a survival model for Dengue infection using Stan. If you’re interested, please register asap. Google is asking for the names for security by tomorrow morning.

Combining results from multiply imputed datasets

Aaron Haslam writes: I have a question regarding combining the estimates from multiply imputed datasets. In the third addition of BDA on the top of page 452, you mention that with Bayesian analyses all you have to do is mix together the simulations. I want to clarify that this means you simply combine the posteriors […]

Lasso regression etc in Stan

[cat picture] Someone on the users list asked about lasso regression in Stan, and Ben replied: In the rstanarm package we have stan_lm(), which is sort of like ridge regression, and stan_glm() with family = gaussian and prior = laplace() or prior = lasso(). The latter estimates the shrinkage as a hyperparameter while the former […]