Skip to content
Archive of posts filed under the Bayesian Statistics category.

We need to stop sacrificing women on the altar of deeply mediocre men (ISBA edition)

(This is not Andrew. I would ask you not to speculate in the comments who S is, this is not a great venue for that.) Kristian Lum just published an essay about her experiences being sexually assaulted at statistics conferences.  You should read the whole thing because it’s important, but there’s a sample paragraph. I […]

The Night Riders

Gilbert Chin writes: After reading this piece [“How one 19-year-old Illinois man Is distorting national polling averages,” by Nate Cohn] and this Nature news story [“Seeing deadly mutations in an new light,” by Erika Hayden], I wonder if you might consider blogging about how this appears to be the same issue in two different disciplines. […]

Ed Jaynes outta control!

A commmenter points to a chapter of E. T. Jaynes’s book on probability and inference that contains the following amazing bit: The information we get from the TV evening news is not that a certain event actually happened in a certain way it is that some news reporter has claimed that it did. Even seeing […]

Always crashing in the same car

“Hey, remember me?  I’ve been busy working like crazy” – Fever Ray I’m at the Banff International Research Station (BIRS) for the week, which is basically a Canadian version of Disneyland where during coffee breaks a Canadian woman with a rake politely walks around telling elk to “shoo”. The topic of this week’s workshop isn’t […]

“Little Data” etc.: My talk at NYU this Friday, 8 Dec 2017

I’ll be talking at the NYU business school, in the department of information, operations, and management sciences, this Fri, 8 Dec 2017, at 12:30, in room KMC 4-90 (wherever that is): Little Data: How Traditional Statistical Ideas Remain Relevant in a Big-Data World; or, The Statistical Crisis in Science; or, Open Problems in Bayesian Data […]

Oooh, I hate all talk of false positive, false negative, false discovery, etc.

A correspondent writes: I think this short post on p value, bayes, and false discovery rate contains some misinterpretations. My reply: Oooh, I hate all talk of false positive, false negative, false discovery, etc. I posted this not because I care about someone, somewhere, being “wrong on the internet.” Rather, I just think there’s so […]

Computational and statistical issues with uniform interval priors

There are two anti-patterns* for prior specification in Stan programs that can be sourced directly to idioms developed for BUGS. One is the diffuse gamma priors that Andrew’s already written about at length. The second is interval-based priors. Which brings us to today’s post. Interval priors An interval prior is something like this in Stan […]

Asymptotically we are all dead (Thoughts about the Bernstein-von Mises theorem before and after a Diamanda Galás concert)

They say I did something bad, then why’s it feel so good–Taylor Swift It’s a Sunday afternoon and I’m trying to work myself up to the sort of emotional fortitude where I can survive the Diamanda Galás concert that I was super excited about a few months ago, but now, as I stare down the […]

Poisoning the well with a within-person design? What’s the risk?

I was thinking more about our recommendation that psychology researchers routinely use within-person rather than between-person designs. The quick story is that a within-person design is more statistically efficient because, when you compare measurements within a person, you should get less variation than when you compare different groups. But researchers often use between-person designs out […]

Using output from a fitted machine learning algorithm as a predictor in a statistical model

Fred Gruber writes: I attended your talk at Harvard where, regarding the question on how to deal with complex models (trees, neural networks, etc) you mentioned the idea of taking the output of these models and fitting a multilevel regression model. Is there a paper you could refer me to where I can read about […]

Stan is a probabilistic programming language

See here: Stan: A Probabilistic Programming Language. Journal of Statistical Software. (Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell) And here: Stan is Turing Complete. So what? (Bob Carpenter) And, the pre-stan version: Fully Bayesian computing. (Jouni Kerman and Andrew Gelman) Apparently […]

Wine + Stan + Climate change = ?

Pablo Almaraz writes: Recently, I published a paper in the journal Climate Research in which I used RStan to conduct the statistical analyses: Almaraz P (2015) Bordeaux wine quality and climate fluctuations during the last century: changing temperatures and changing industry. Clim Res 64:187-199.

Custom Distribution Solutions

I (Aki) recently made a case study that demonstrates how to implement user defined probability functions in Stan language (case study, git repo). As an example I use the generalized Pareto distribution (GPD) to model extreme values of geomagnetic storm data from the World Data Center for Geomagnetism. Stan has had support for user defined […]

Spatial models for demographic trends?

Jon Minton writes: You may be interested in a commentary piece I wrote early this year, which was published recently in the International Journal of Epidemiology, where I discuss your work on identifying an aggregation bias in one of the key figures in Case & Deaton’s (in)famous 2015 paper on rising morbidity and mortality in […]

Computing marginal likelihoods in Stan, from Quentin Gronau and E. J. Wagenmakers

Gronau and Wagemakers write: The bridgesampling package facilitates the computation of the marginal likelihood for a wide range of different statistical models. For models implemented in Stan (such that the constants are retained), executing the code bridge_sampler(stanfit) automatically produces an estimate of the marginal likelihood. Full story is at the link.

The Statistical Crisis in Science—and How to Move Forward (my talk next Monday 6pm at Columbia)

I’m speaking Mon 13 Nov, 6pm, at Low Library Rotunda at Columbia: The Statistical Crisis in Science—and How to Move Forward Using examples ranging from elections to birthdays to policy analysis, Professor Andrew Gelman will discuss ways in which statistical methods have failed, leading to a replication crisis in much of science, as well as […]

Why won’t you cheat with me?

But I got some ground rules I’ve found to be sound rules and you’re not the one I’m exempting. Nonetheless, I confess it’s tempting. – Jenny Toomey sings Franklin Bruno It turns out that I did something a little controversial in last week’s post. As these things always go, it wasn’t the thing I was […]

The king must die

“And then there was Yodeling Elaine, the Queen of the Air. She had a dollar sign medallion about as big as a dinner plate around her neck and a tiny bubble of spittle around her nostril and a little rusty tear, for she had lassoed and lost another tipsy sailor“– Tom Waits It turns out I turned […]

Using Mister P to get population estimates from respondent driven sampling

From one of our exams: A researcher at Columbia University’s School of Social Work wanted to estimate the prevalence of drug abuse problems among American Indians (Native Americans) living in New York City. From the Census, it was estimated that about 30,000 Indians live in the city, and the researcher had a budget to interview […]

My 2 talks in Seattle this Wed and Thurs: “The Statistical Crisis in Science” and “Bayesian Workflow”

For the Data Science Seminar, Wed 25 Oct, 3:30pm in Physics and Astronomy Auditorium – A102: The Statistical Crisis in Science Top journals routinely publish ridiculous, scientifically implausible claims, justified based on “p < 0.05.” And this in turn calls into question all sorts of more plausible, but not necessarily true, claims, that are supported […]