Skip to content
Archive of entries posted by

Mighty oaks from little acorns grow

Eric Loken writes: Do by any chance remember the bogus survey that Augusta National carried out in 2002 to deflect criticism about not having any female members? I even remember this survey being ridiculed by ESPN who said their polls showed much more support for a boycott and sympathy with Martha Burke. Anyway, sure that’s […]

Frustration with published results that can’t be reproduced, and journals that don’t seem to care

Thomas Heister writes: Your recent post about Per Pettersson-Lidbom frustrations in reproducing study results reminded me of our own recent experience that we had in replicating a paper in PLOSone. We found numerous substantial errors but eventually gave up as, frustratingly, the time and effort didn’t seem to change anything and the journal’s editors quite […]

So little information to evaluate effects of dietary choices

Paul Alper points to this excellent news article by Aaron Carroll, who tells us how little information is available in studies of diet and public health. Here’s Carroll: Just a few weeks ago, a study was published in the Journal of Nutrition that many reports in the news media said proved that honey was no […]

Some U.S. demographic data at zipcode level conveniently in R

Ari Lamstein writes: I chuckled when I read your recent “R Sucks” post. Some of the comments were a bit … heated … so I thought to send you an email instead. I agree with your point that some of the datasets in R are not particularly relevant. The way that I’ve addressed that is […]

Survey weighting and that 2% swing

Nate Silver agrees with me that much of that shocking 2% swing can be explained by systematic differences between sample and population: survey respondents included too many Clinton supporters, even after corrections from existing survey adjustments. In Nate’s words, “Pollsters Probably Didn’t Talk To Enough White Voters Without College Degrees.” Last time we looked carefully […]

How can you evaluate a research paper?

Shea Levy writes: You ended a post from last month [i.e., Feb.] with the injunction to not take the fact of a paper’s publication or citation status as meaning anything, and instead that we should “read each paper on its own.” Unfortunately, while I can usually follow e.g. the criticisms of a paper you might […]

An exciting new entry in the “clueless graphs from clueless rich guys” competition

Jeff Lax points to this post from Matt Novak linking to a post by Matt Taibbi that shares the above graph from newspaper columnist / rich guy Thomas Friedman. I’m not one to spend precious blog space mocking bad graphs, so I’ll refer you to Novak and Taibbi for the details. One thing I do […]

Interesting epi paper using Stan

Jon Zelner writes: Just thought I’d send along this paper by Justin Lessler et al. Thought it was both clever & useful and a nice ad for using Stan for epidemiological work. Basically, what this paper is about is estimating the true prevalence and case fatality ratio of MERS-CoV [Middle East Respiratory Syndrome Coronavirus Infection] […]

“A bug in fMRI software could invalidate 15 years of brain research”

About 50 people pointed me to this press release or the underlying PPNAS research article, “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates,” by Anders Eklund, Thomas Nichols, and Hans Knutsson, who write: Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated […]

OK, sometimes the concept of “false positive” makes sense.

Paul Alper writes: I know by searching your blog that you hold the position, “I’m negative on the expression ‘false positives.’” Nevertheless, I came across this. In the medical/police/judicial world, false positive is a very serious issue: $2 Cost of a typical roadside drug test kit used by police departments. Namely, is that white powder […]

An election just happened and I can’t stop talking about it

Some things I’ve posted elsewhere: The Electoral College magnifies the power of white voters (with Pierre-Antoine Kremp) I’m not impressed by this claim of vote rigging And, in case you missed it: Explanations for that shocking 2% shift Coming soon: What theories in political science got supported or shot down by the 2016 election? (with […]

Reminder: Instead of “confidence interval,” let’s say “uncertainty interval”

We had a vigorous discussion the other day on confusions involving the term “confidence interval,” what does it mean to have “95% confidence,” etc. This is as good a time as any for me to remind you that I prefer the term “uncertainty interval”. The uncertainty interval tells you how much uncertainty you have. That […]

Happiness formulas

Jazi Zilber writes: Have you heard of “the happiness formula”? Lyubomirsky at al. 2005. Happiness = 0.5 genetic, 0.1 circumstances, 0.4 “intentional activity” They took the 0.4 unexplained variance and argued it is “intentional activity” Cited hundreds of times by everybody. The absurd is, to you even explaining it is unneeded. For others, I do […]

Discussion on overfitting in cluster analysis

Ben Bolker wrote: It would be fantastic if you could suggest one or two starting points for the idea that/explanation why BIC should naturally fail to identify the number of clusters correctly in the cluster-analysis context. Bob Carpenter elaborated: Ben is finding that using BIC to select number of mixture components is selecting too many […]

“Breakfast skipping, extreme commutes, and the sex composition at birth”

Bhash Mazumder sends along a paper (coauthored with Zachary Seeskin) which begins: A growing body of literature has shown that environmental exposures in the period around conception can affect the sex ratio at birth through selective attrition that favors the survival of female conceptuses. Glucose availability is considered a key indicator of the fetal environment, […]

Abraham Lincoln and confidence intervals

Our recent discussion with mathematician Russ Lyons on confidence intervals reminded me of a famous logic paradox, in which equality is not as simple as it seems. The classic example goes as follows: Abraham Lincoln is the 16th president of the United States, but this does not mean that one can substitute the two expressions […]

I’m only adding new posts when they’re important . . . and this one’s really important.

Durf Humphries writes: I’m a fact-checker and digital researcher in Atlanta. Your blog has been quite useful to me this week. Your statistics and explanations are impressive, but the decision to ornament your articles with such handsome cats? That’s divine genius and it’s apparent that these are not random cats, but carefully curated critters that […]

How best to partition data into test and holdout samples?

Bill Harris writes: In “Type M error can explain Weisburd’s Paradox,” you reference Button et al. 2013. While reading that article, I noticed figure 1 and the associated text describing the 50% probability of failing to detect a significant result with a replication of the same size as the original test that was just significant. […]


Wow. P.S. In the comment thread, Peter Dorman has an interesting discussion of Carlsen’s errors so far during the tournament.

Deep learning, model checking, AI, the no-homunculus principle, and the unitary nature of consciousness

Bayesian data analysis, as my colleagues and I have formulated it, has a human in the loop. Here’s how we put it on the very first page of our book: The process of Bayesian data analysis can be idealized by dividing it into the following three steps: 1. Setting up a full probability model—a joint […]