Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

How can you evaluate a research paper?

Shea Levy writes: You ended a post from last month [i.e., Feb.] with the injunction to not take the fact of a paper’s publication or citation status as meaning anything, and instead that we should “read each paper on its own.” Unfortunately, while I can usually follow e.g. the criticisms of a paper you might […]

“A bug in fMRI software could invalidate 15 years of brain research”

About 50 people pointed me to this press release or the underlying PPNAS research article, “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates,” by Anders Eklund, Thomas Nichols, and Hans Knutsson, who write: Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated […]

Reminder: Instead of “confidence interval,” let’s say “uncertainty interval”

We had a vigorous discussion the other day on confusions involving the term “confidence interval,” what does it mean to have “95% confidence,” etc. This is as good a time as any for me to remind you that I prefer the term “uncertainty interval”. The uncertainty interval tells you how much uncertainty you have. That […]

Discussion on overfitting in cluster analysis

Ben Bolker wrote: It would be fantastic if you could suggest one or two starting points for the idea that/explanation why BIC should naturally fail to identify the number of clusters correctly in the cluster-analysis context. Bob Carpenter elaborated: Ben is finding that using BIC to select number of mixture components is selecting too many […]

“Breakfast skipping, extreme commutes, and the sex composition at birth”

Bhash Mazumder sends along a paper (coauthored with Zachary Seeskin) which begins: A growing body of literature has shown that environmental exposures in the period around conception can affect the sex ratio at birth through selective attrition that favors the survival of female conceptuses. Glucose availability is considered a key indicator of the fetal environment, […]

Abraham Lincoln and confidence intervals

Our recent discussion with mathematician Russ Lyons on confidence intervals reminded me of a famous logic paradox, in which equality is not as simple as it seems. The classic example goes as follows: Abraham Lincoln is the 16th president of the United States, but this does not mean that one can substitute the two expressions […]

How best to partition data into test and holdout samples?

Bill Harris writes: In “Type M error can explain Weisburd’s Paradox,” you reference Button et al. 2013. While reading that article, I noticed figure 1 and the associated text describing the 50% probability of failing to detect a significant result with a replication of the same size as the original test that was just significant. […]

Deep learning, model checking, AI, the no-homunculus principle, and the unitary nature of consciousness

Bayesian data analysis, as my colleagues and I have formulated it, has a human in the loop. Here’s how we put it on the very first page of our book: The process of Bayesian data analysis can be idealized by dividing it into the following three steps: 1. Setting up a full probability model—a joint […]

Thinking more seriously about the design of exploratory studies: A manifesto

In the middle of a long comment thread on a silly Psychological Science paper, Ed Hagen wrote: Exploratory studies need to become a “thing.” Right now, they play almost no formal role in social science, yet they are essential to good social science. That means we need to put as much effort in developing standards, […]

“Men with large testicles”

Above is the title of an email I received from Marcel van Assen. We were having a discussion of PPNAS papers—I was relating my frustration about Case and Deaton’s response to my letter with Auerbach on age adjustment in mortality trends—and Assen wrote: We also commented on a paper in PNAS. The original paper was […]

More on my paper with John Carlin on Type M and Type S errors

Deborah Mayo asked me some questions about that paper (“Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors”), and here’s how I responded: I am not happy with the concepts of “power,” “type 1 error,” and “type 2 error,” because all these are defined in terms of statistical significance, which I am […]

Election surprise, and Three ways of thinking about probability

Background: Hillary Clinton was given a 65% or 80% or 90% chance of winning the electoral college. She lost. Naive view: The poll-based models and the prediction markets said Clinton would win, and she lost. The models are wrong! Slightly sophisticated view: The predictions were probabilistic. 1-in-3 events happen a third of the time. 1-in-10 […]

Postdoc opportunities with Jingchen Liu and Zhiliang Ying in Columbia statistics dept

My colleagues write: Multiple full-time postdoctoral researcher positions are available beginning July 1, 2017 (possibly sooner upon request) in the Department of Statistics at Columbia University working with Drs. Jingchen Liu and Zhiliang Ying. The main responsibilities of these positions consist of latent variable modeling and graphical modeling for innovative assessment and cognitive learning, designing […]

“Generic and consistent confidence and credible regions”

Christian Bartels sends along this paper, which begins: A generic, consistent, efficient and exact method is proposed for set selection. The method is generic in that its definition and implementation uses only the likelihood function. The method is consistent in that the same criterion is used to select confidence and credible sets making the two […]

How not to analyze noisy data: A case study

I was reading Jenny Davidson’s blog and came upon this note on an autobiography of the eccentric (but aren’t we all?) biologist Robert Trivers. This motivated me, not to read Trivers’s book, but to do some googling which led me to this paper from Plos-One, “Revisiting a sample of U.S. billionaires: How sample selection and […]

Advice on setting up audio for your podcast

Jennifer and I were getting ready to do our podcast, and in preparation we got some advice from Enrico Bertini and the Data Stories team: 1) Multitracking. The best way is to multitrack and have each person record locally (note: this is easier if you are in different rooms/locations). Multitracking gives you a lot of […]

“Marginally Significant Effects as Evidence for Hypotheses: Changing Attitudes Over Four Decades”

Kevin Lewis sends along this article by Laura Pritschet, Derek Powell, and Zachary Horne, who write: Some effects are statistically significant. Other effects do not reach the threshold of statistical significance and are sometimes described as “marginally significant” or as “approaching significance.” Although the concept of marginal significance is widely deployed in academic psychology, there […]

Handy Statistical Lexicon — in Japanese!

So, one day I get this email from Kentaro Matsuura: Dear Professor Andrew Gelman, I’m a Japanese Stan user and write a blog to promote Stan. (and translator of https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started-(Japanese)) I believe your post on “Handy statistical lexicon (http://andrewgelman.com/2009/05/24/handy_statistic/)” is so great that I’d like to translate and spread the post in my blog. Could […]

Why the garden-of-forking-paths criticism of p-values is not like a famous Borscht Belt comedy bit

People point me to things on the internet that they’re sure I’ll hate. I read one of these awhile ago—unfortunately I can’t remember who wrote it or where it appeared, but it raised a criticism, not specifically of me, I believe, but more generally of skeptics such as Uri Simonsohn and myself who keep bringing […]

“Find the best algorithm (program) for your dataset.”

Piero Foscari writes: Maybe you know about this already, but I found it amazingly brutal; while looking for some reproducible research resources I stumbled onto the following at mlcomp.org (which would be nice if done properly, at least as a standardization attempt): Find the best algorithm (program) for your dataset. Upload your dataset and run existing programs on it to […]