Skip to content
Archive of posts filed under the Bayesian Statistics category.

Laurie Davies: time series decomposition of birthday data

On the cover of BDA3 is a Bayesian decomposition of the time series of birthdays in the U.S. over a 20-year period. We modeled the data as a sum of Gaussian processes and fit it using GPstuff. Occasionally we fit this model to new data; see for example this discussion of Friday the 13th and […]

We’re hiring! hiring! hiring! hiring!

[insert picture of adorable cat entwined with Stan logo] We’re hiring postdocs to do Bayesian inference. We’re hiring programmers for Stan. We’re hiring a project manager. How many people we hire depends on what gets funded. But we’re hiring a few people for sure. We want the best best people who love to collaborate, who […]

To know the past, one must first know the future: The relevance of decision-based thinking to statistical analysis

We can break up any statistical problem into three steps: 1. Design and data collection. 2. Data analysis. 3. Decision making. It’s well known that step 1 typically requires some thought of steps 2 and 3: It is only when you have a sense of what you will do with your data, that you can […]

“A Conceptual Introduction to Hamiltonian Monte Carlo”

Michael Betancourt writes: Hamiltonian Monte Carlo has proven a remarkable empirical success, but only recently have we begun to develop a rigorous understanding of why it performs so well on difficult problems and how it is best applied in practice. Unfortunately, that understanding is con- fined within the mathematics of differential geometry which has limited […]

I’ve said it before and I’ll say it again

Ryan Giordano, Tamara Broderick, and Michael Jordan write: In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. One hopes that the posterior is robust to reasonable variation in the choice of prior, since this choice is made by the modeler and is often somewhat subjective. A […]

Nooooooo, just make it stop, please!

Dan Kahan wrote: You should do a blog on this. I replied: I don’t like this article but I don’t really see the point in blogging on it. Why bother? Kahan: BECAUSE YOU REALLY NEVER HAVE EXPLAINED WHY. Gelman-Rubin criticque of BIC is *not* responsive; you have something in mind—tell us what, pls! Inquiring minds […]

Steve Fienberg

I did not know Steve Fienberg well, but I met him several times and encountered his work on various occasions, which makes sense considering his research area was statistical modeling as applied to social science. Fienberg’s most influential work must have been his books on the analysis of categorical data, work that was ahead of […]

Designing an animal-like brain: black-box “deep learning algorithms” to solve problems, with an (approximately) Bayesian “consciousness” or “executive functioning organ” that attempts to make sense of all these inferences

The journal Behavioral and Brain Sciences will be publishing this paper, “Building Machines That Learn and Think Like People,” by Brenden Lake, Tomer Ullman, Joshua Tenenbaum, and Samuel Gershman. Here’s the abstract: Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from […]

Bayesian statistics: What’s it all about?

Kevin Gray sent me a bunch of questions on Bayesian statistics and I responded. The interview is here at KDnuggets news. For some reason the KDnuggets editors gave it the horrible, horrible title, “Bayesian Basics, Explained.” I guess they don’t waste their data mining and analytics skills on writing blog post titles! That said, I […]

Avoiding only the shadow knowing the motivating problem of a post.

Graphic From Given I am starting to make some posts to this blog (again) I was pleased to run across a youtube of Xiao-Li Meng being interviewed on the same topic by Suzanne Smith the Director of the Center for Writing and Communicating Ideas. One thing I picked up was to make the problem being addressed […]

Avoiding selection bias by analyzing all possible forking paths

Ivan Zupic points me to this online discussion of the article, Dwork et al. 2015, The reusable holdout: Preserving validity in adaptive data analysis. The discussants are all talking about the connection between adaptive data analysis and the garden of forking paths; for example, this from one commenter: The idea of adaptive data analysis is […]

“The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling”

Here’s Michael Betancourt writing in 2015: Leveraging the coherent exploration of Hamiltonian flow, Hamiltonian Monte Carlo produces computationally efficient Monte Carlo estimators, even with respect to complex and high-dimensional target distributions. When confronted with data-intensive applications, however, the algorithm may be too expensive to implement, leaving us to consider the utility of approximations such as […]

Using Stan in an agent-based model: Simulation suggests that a market could be useful for building public consensus on climate change

Jonathan Gilligan writes: I’m writing to let you know about a preprint that uses Stan in what I think is a novel manner: Two graduate students and I developed an agent-based simulation of a prediction market for climate, in which traders buy and sell securities that are essentially bets on what the global average temperature […]

Interesting epi paper using Stan

Jon Zelner writes: Just thought I’d send along this paper by Justin Lessler et al. Thought it was both clever & useful and a nice ad for using Stan for epidemiological work. Basically, what this paper is about is estimating the true prevalence and case fatality ratio of MERS-CoV [Middle East Respiratory Syndrome Coronavirus Infection] […]

OK, sometimes the concept of “false positive” makes sense.

Paul Alper writes: I know by searching your blog that you hold the position, “I’m negative on the expression ‘false positives.’” Nevertheless, I came across this. In the medical/police/judicial world, false positive is a very serious issue: $2 Cost of a typical roadside drug test kit used by police departments. Namely, is that white powder […]

Discussion on overfitting in cluster analysis

Ben Bolker wrote: It would be fantastic if you could suggest one or two starting points for the idea that/explanation why BIC should naturally fail to identify the number of clusters correctly in the cluster-analysis context. Bob Carpenter elaborated: Ben is finding that using BIC to select number of mixture components is selecting too many […]

Abraham Lincoln and confidence intervals

Our recent discussion with mathematician Russ Lyons on confidence intervals reminded me of a famous logic paradox, in which equality is not as simple as it seems. The classic example goes as follows: Abraham Lincoln is the 16th president of the United States, but this does not mean that one can substitute the two expressions […]

How best to partition data into test and holdout samples?

Bill Harris writes: In “Type M error can explain Weisburd’s Paradox,” you reference Button et al. 2013. While reading that article, I noticed figure 1 and the associated text describing the 50% probability of failing to detect a significant result with a replication of the same size as the original test that was just significant. […]

Deep learning, model checking, AI, the no-homunculus principle, and the unitary nature of consciousness

Bayesian data analysis, as my colleagues and I have formulated it, has a human in the loop. Here’s how we put it on the very first page of our book: The process of Bayesian data analysis can be idealized by dividing it into the following three steps: 1. Setting up a full probability model—a joint […]

Stan Case Studies: A good way to jump in to the language

Wanna learn Stan? Everybody’s talking bout it. Here’s a way to jump in: Stan Case Studies. Find one you like and try it out. P.S. I blogged this last month but it’s so great I’m blogging it again. For this post, the target audience is not already-users of Stan but new users.