Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

The Pandora Principle in statistics — and its malign converse, the ostrich

The Pandora Principle is that once you’ve considered a possible interaction or bias or confounder, you can’t un-think it. The malign converse is when people realize this and then design their studies to avoid putting themselves in a position where they have to consider some potentially important factor. For example, suppose you’re considering some policy […]

I don’t like discrete models (hot hand in baseball edition)

Bill Jefferys points us to this article, “Baseball’s ‘Hot Hand’ Is Real,” in which Rob Arthur and Greg Matthews analyze a year of pitch-by-pitch data from Major League Baseball. There are some good things in their analysis, and I think a lot can be learned from these data using what Arthur and Matthews did, so […]

The Supreme Court can’t do statistics. And, what’s worse, they don’t know what they don’t know.

Kevin Lewis points us to this article by Ryan Enos, Anthony Fowler, and Christopher Havasy, who write: This article examines the negative effect fallacy, a flawed statistical argument first utilized by the Warren Court in Elkins v. United States. The Court argued that empirical evidence could not determine whether the exclusionary rule prevents future illegal […]

What readings should be included in a seminar on the philosophy of statistics, the replication crisis, causation, etc.?

André Ariew writes: I’m a philosopher of science at the University of Missouri. I’m interested in leading a seminar on a variety of current topics with philosophical value, including problems with significance tests, the replication crisis, causation, correlation, randomized trials, etc. I’m hoping that you can point me in a good direction for accessible readings […]

A stunned Dyson

Terry Martin writes: I ran into this quote and thought you might enjoy it. It’s from p. 273 of Segre’s new biography of Fermi, The Pope of Physics: When Dyson met with him in 1953, Fermi welcomed him politely, but he quickly put aside the graphs he was being shown indicating agreement between theory and […]

How does a Nobel-prize-winning economist become a victim of bog-standard selection bias?

Someone who wishes to remain anonymous writes in with a story: Linking to a new paper by Jorge Luis García, James J. Heckman, and Anna L. Ziff, an economist Sue Dynarski makes this “joke” on facebook—or maybe it’s not a joke: How does one adjust standard errors to account for the fact that N of […]

Classical statisticians as Unitarians

[cat picture] Christian Robert, Judith Rousseau, and I wrote: Several of the examples in [the book under review] represent solutions to problems that seem to us to be artificial or conventional tasks with no clear analogy to applied work. “They are artificial and are expressed in terms of a survey of 100 individuals expressing support […]

Clinical trials are broken. Here’s why.

Someone emailed me with some thoughts on systemic exertion intolerance disease, in particular, controversies regarding the Pace trial which evaluated psychological interventions for this condition or, should I say, set of conditions. I responded as follows: At one point I had the thought of doing a big investigative project on this, formally interviewing a bunch […]

You can read two versions of this review essay on systemic exertion intolerance disease (chronic fatigue syndrome)

Julie Rehmeyer wrote a book, “Through the Shadowlands: A Science Writer’s Odyssey into an Illness Science Doesn’t Understand,” and my review appeared in the online New Yorker, much shortened and edited, and given the title, “A memoir of chronic fatigue illustrates the failures of medical research.” My original was titled, “Systemic exertion intolerance disease: The […]

My unpublished papers

My oldest unpublished paper dates from my sophomore year in college. I can’t remember the title or all the details, but it was a solution to a differential-difference equation. The story of how it came about is here. A couple years after figuring out the proof, I wrote it up and submitted it to a […]

Statisticians and economists agree: We should learn from data by “generating and revising models, hypotheses, and data analyzed in response to surprising findings.” (That’s what Bayesian data analysis is all about.)

Kevin Lewis points us to this article by economist James Heckman and statistician Burton Singer, who write: All analysts approach data with preconceptions. The data never speak for themselves. Sometimes preconceptions are encoded in precise models. Sometimes they are just intuitions that analysts seek to confirm and solidify. A central question is how to revise […]

“The Null Hypothesis Screening Fallacy”?

[non-cat picture] Rick Gerkin writes: A few months ago you posted your list of blog posts in draft stage and I noticed that “Humans Can Discriminate More than 1 Trillion Olfactory Stimuli. Not.” was still on that list. It was about some concerns I had about a paper in Science ( After talking it through […]

Let’s stop talking about published research findings being true or false

I bear some of the blame for this. When I heard about John Ioannidis’s paper, “Why Most Published Research Findings Are False,” I thought it was cool. Ioannidis was on the same side as me, and Uri Simonsohn, and Greg Francis, and Paul Meehl, in the replication debate: he felt that there was a lot […]

Problems with the jargon “statistically significant” and “clinically significant”

Someone writes: After listening to your EconTalk episode a few weeks ago, I have a question about interpreting treatment effect magnitudes, effect sizes, SDs, etc. I studied Econ/Math undergrad and worked at a social science research institution in health policy as a research assistant, so I have a good amount of background. At the institution […]

Ride a Crooked Mile

Joachim Krueger writes: As many of us rely (in part) on p values when trying to make sense of the data, I am sending a link to a paper Patrick Heck and I published in Frontiers in Psychology. The goal of this work is not to fan the flames of the already overheated debate, but […]

Criminology corner: Type M error might explain Weisburd’s Paradox

[silly cartoon found by googling *cat burglar*] Torbjørn Skardhamar, Mikko Aaltonen, and I wrote this article to appear in the Journal of Quantitative Criminology: Simple calculations seem to show that larger studies should have higher statistical power, but empirical meta-analyses of published work in criminology have found zero or weak correlations between sample size and […]

Why I’m not participating in the Transparent Psi Project

I received the following email from psychology researcher Zoltan Kekecs: I would like to ask you to participate in the establishment of the expert consensus design of a large scale fully transparent replication of Bem’s (2011) ‘Feeling the future’ Experiment 1. Our initiative is called the ‘Transparent Psi Project’. [] Our aim is to develop […]

Financial anomalies are contingent on being unknown

Jonathan Falk points us to this article by Kewei Hou, Chen Xue, and Lu Zhang, who write: In retrospect, the anomalies literature is a prime target for p-hacking. First, for decades, the literature is purely empirical in nature, with little theoretical guidance. Second, with trillions of dollars invested in anomalies-based strategies in the alone, […]

“Bombshell” statistical evidence for research misconduct, and what to do about it?

Someone pointed me to this post by Nick Brown discussing a recent article by John Carlisle regarding scientific misconduct. Here’s Brown: [Carlisle] claims that he has found statistical evidence that a surprisingly high proportion of randomised controlled trials (RCTs) contain data patterns that cannot have arisen by chance. . . . the implication is that […]

No conf intervals? No problem (if you got replication).

This came up in a research discussion the other day. Someone had produced some estimates, and there was a question: where are the conf intervals. I said that if you have replication and you graph the estimates that were produced, then you don’t really need conf intervals (or, for that matter, p-values). The idea is […]