A few days ago I shared my reactions to an op-ed by developmental psychologist Alison Gopnik. Gopnik replied: As a regular reader of your blog, I thought you and your readers might be interested in a response to your very fair comments. In the original draft I had an extra few paragraphs (below) that speak […]

**Decision Theory**category.

## Some natural solutions to the p-value communication problem—and why they won’t work.

John Carlin and I write: It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis […]

## How to interpret “p = .06” in situations where you really really want the treatment to work?

We’ve spent a lot of time during the past few years discussing the difficulty of interpreting “p less than .05” results from noisy studies. Standard practice is to just take the point estimate and confidence interval, but this is in general wrong in that it overestimates effect size (type M error) and can get the […]

## A completely reasonable-sounding statement with which I strongly disagree

From a couple years ago: In the context of a listserv discussion about replication in psychology experiments, someone wrote: The current best estimate of the effect size is somewhere in between the original study and the replication’s reported value. This conciliatory, split-the-difference statement sounds reasonable, and it might well represent good politics in the context […]

## 7th graders trained to avoid Pizzagate-style data exploration—but is the training too rigid?

[cat picture] Laura Kapitula writes: I wanted to share a cute story that gave me a bit of hope. My daughter who is in 7th grade was doing her science project. She had designed an experiment comparing lemon batteries to potato batteries, a 2×4 design with lemons or potatoes as one factor and number of […]

## What hypothesis testing is all about. (Hint: It’s not what you think.)

From 2015: The conventional view: Hyp testing is all about rejection. The idea is that if you reject the null hyp at the 5% level, you have a win, you have learned that a certain null model is false and science has progressed, either in the glamorous “scientific revolution” sense that you’ve rejected a central […]

## The Bolt from the Blue

Lionel Hertzog writes: In the method section of a recent Nature article in my field of research (diversity-ecosystem function) one can read the following: The inclusion of many predictors in statistical models increases the chance of type I error (false positives). To account for this we used a Bernoulli process to detect false discovery rates, […]

## “The earth is flat (p > 0.05): Significance thresholds and the crisis of unreplicable research”

Valentin Amrhein, Fränzi Korner-Nievergelt, and Tobias Roth write: The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process. We review why degrading p-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major […]

## Blue Cross Blue Shield Health Index

Chris Famighetti points us to this page which links to an interactive visualization. There are some problems with the mapping software—when I clicked through, it showed a little map of the western part of the U.S., accompanied by huge swathes of Canada and the Pacific Ocean—and I haven’t taken a look at the methodology. But […]

## Using prior knowledge in frequentist tests

Christian Bartels send along this paper, which he described as an attempt to use informative priors for frequentist test statistics. I replied: I’ve not tried to follow the details but this reminds me of our paper on posterior predictive checks. People think of this as very Bayesian but my original idea when doing this research […]

## Would you prefer three N=300 studies or one N=900 study?

Stephen Martin started off with a question: I’ve been thinking about this thought experiment: — Imagine you’re given two papers. Both papers explore the same topic and use the same methodology. Both were preregistered. Paper A has a novel study (n1=300) with confirmed hypotheses, followed by two successful direct replications (n2=300, n3=300). Paper B has […]

## Reputational incentives and post-publication review: two (partial) solutions to the misinformation problem

So. There are erroneous analyses published in scientific journals and in the news. Here I’m not talking not about outright propaganda, but about mistakes that happen to coincide with the preconceptions of their authors. We’ve seen lots of examples. Here are just a few: – Political scientist Larry Bartels is committed to a model of […]

## Stacking, pseudo-BMA, and AIC type weights for combining Bayesian predictive distributions

This post is by Aki. We have often been asked in the Stan user forum how to do model combination for Stan models. Bayesian model averaging (BMA) by computing marginal likelihoods is challenging in theory and even more challenging in practice using only the MCMC samples obtained from the full model posteriors. Some users have […]

## Beyond subjective and objective in statistics: my talk with Christian Hennig tomorrow (Wed) 5pm in London

Christian Hennig and I write: Decisions in statistical data analysis are often justified, criticized, or avoided using concepts of objectivity and subjectivity. We argue that the words “objective” and “subjective” in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity […]

## Tech company wants to hire Stan programmers!

Ittai Kan writes: I started life as an academic mathematician (chaos theory) but have long since moved into industry. I am currently Chief Scientist at Afiniti, a contact center routing technology company that connects agent and callers on the basis of various factors in order to globally optimize the contact center performance. We have 17 […]

## Gilovich doubles down on hot hand denial

[cat picture] A correspondent pointed me to this Freaknomics radio interview with Thomas Gilovich, one of the authors of that famous “hot hand” paper from 1985, “Misperception of Chance Processes in Basketball.” Here’s the key bit from the Freakonomics interview: DUBNER: Right. The “hot-hand notion” or maybe the “hot-hand fallacy.” GILOVICH: Well, everyone who’s ever […]

## Let’s accept the idea that treatment effects vary—not as something special but just as a matter of course

Tyler Cowen writes: Does knowing the price lower your enjoyment of goods and services? I [Cowen] don’t quite agree with this as stated, as the experience of enjoying a bargain can make it more pleasurable, or at least I have seen this for many people. Some in fact enjoy the bargain only, not the actual […]

## This could be a big deal: the overuse of psychotropic medications for advanced Alzheimer’s patients

I received the following email, entitled “A research lead (potentially bigger than the opioid epidemic,” from someone who wishes to remain anonymous: My research lead is related to the use of psychotropic medications in Alzheimer’s patients. I should note that strong cautions have already been issued with respect to the use of these medications in […]

## Some natural solutions to the p-value communication problem—and why they won’t work

Blake McShane and David Gal recently wrote two articles (“Blinding us to the obvious? The effect of statistical training on the evaluation of evidence” and “Statistical significance and the dichotomization of evidence”) on the misunderstandings of p-values that are common even among supposed experts in statistics and applied social research. The key misconception has nothing […]