Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

“Which curve fitting model should I use?”

Oswaldo Melo writes: I have learned many of curve fitting models in the past, including their technical and mathematical details. Now I have been working on real-world problems and I face a great shortcoming: which method to use. As an example, I have to predict the demand of a product. I have a time series […]

When you add a predictor the model changes so it makes sense that the coefficients change too.

Shane Littrell writes: I’ve recently graduated with my Masters in Science in Research Psych but I’m currently trying to get better at my stats knowledge (in psychology, we tend to learn a dumbed down, “Stats for Dummies” version of things). I’ve been reading about “suppressor effects” in regression recently and it got me curious about […]

Field Experiments and Their Critics

Seven years ago I was contacted by Dawn Teele, who was then a graduate student and is now a professor of political science, and asked for my comments on an edited book she was preparing on social science experiments and their critics. I responded as follows: This is a great idea for a project. My […]

Fragility index is too fragile

Simon Gates writes: Where is an issue that has had a lot of publicity and Twittering in the clinical trials world recently. Many people are promoting the use of the “fragility index” (paper attached) to help interpretation of “significant” results from clinical trials. The idea is that it gives a measure of how robust the […]

Two unrelated topics in one post: (1) Teaching useful algebra classes, and (2) doing more careful psychological measurements

Kevin Lewis and Paul Alper send me so much material, I think they need their own blogs. In the meantime, I keep posting the stuff they send me, as part of my desperate effort to empty my inbox. 1. From Lewis: “Should Students Assessed as Needing Remedial Mathematics Take College-Level Quantitative Courses Instead? A Randomized […]

“The Pitfall of Experimenting on the Web: How Unattended Selective Attrition Leads to Surprising (Yet False) Research Conclusions”

Kevin Lewis points us to this paper by Haotian Zhou and Ayelet Fishbach, which begins: The authors find that experimental studies using online samples (e.g., MTurk) often violate the assumption of random assignment, because participant attrition—quitting a study before completing it and getting paid—is not only prevalent, but also varies systemically across experimental conditions. Using […]

Ethics and statistics

For a few years now, I’ve been writing a column in Chance. Below are the articles so far. This is by no means an exhaustive list of my writings on ethics and statistics but at least I thought it could help to collect these columns in one place. Ethics and statistics: Open data and open […]

Christmas special: Survey research, network sampling, and Charles Dickens’ coincidences

It’s Christmas so what better time to write about Charles Dickens . . . Here’s the story: In traditional survey research we have been spoiled. If you work with atomistic data structures, a small sample looks like a little bit of the population. But a small sample of a network doesn’t look like the whole. […]

Steve Fienberg

I did not know Steve Fienberg well, but I met him several times and encountered his work on various occasions, which makes sense considering his research area was statistical modeling as applied to social science. Fienberg’s most influential work must have been his books on the analysis of categorical data, work that was ahead of […]

Low correlation of predictions and outcomes is no evidence against hot hand

Josh Miller (of Miller & Sanjurjo) writes: On correlations, you know, the original Gilovich, Vallone, and Tversky paper found that the Cornell players’ “predictions” of their teammates’ shots correlated 0.04, on average. No evidence they can see the hot hand, right? Here is an easy correlation question: suppose Bob shoots with probability ph=.55 when he […]

How can time series information be used to choose a control group?

This post is by Phil Price, not Andrew. Before I get to my question, you need some background. The amount of electricity that is provided by an electric utility at a given time is called the “electric load”, and the time series of electric load is called the “load shape.” Figure 1 (which is labeled […]

Applying statistical thinking to the search for extraterrestrial intelligence

Thomas Basbøll writes: A statistical question has been bugging me lately. I recently heard that Yuti Milner has donated 100 millions dollars to 10-year search for extraterrestrial intelligence. I’m not very practiced in working out probability functions but I thought maybe you or your readers would find it easy and fun to do this. Here’s […]

The social world is (in many ways) continuous but people’s mental models of the world are Boolean

Raghu Parthasarathy points me to this post and writes: I wrote after seeing one too many talks in which someone bases boolean statements about effects “existing” or “not existing” (infuriating in itself) based on “p < 0.05” or “p > 0.5”. Of course, you’ve written tons of great things on the pitfalls, errors, and general […]

How to think about the p-value from a randomized test?

Roahn Wynart asks: Scenario: I collect a lot of data for a complex psychology experiment. I put all the raw data into a computer. I program the computer to do 100 statistical tests. I assign each statistical test to a key on my keyboard. However, I do NOT execute the statistical test. Each key will […]

fMRI clusterf******

Several people pointed me to this paper by Anders Eklund, Thomas Nichols, and Hans Knutsson, which begins: Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data. Here, we used resting-state fMRI data from 499 healthy controls to conduct 3 million task group analyses. […]

best algorithm EVER !!!!!!!!

Someone writes: On the website https://odajournal.com/ you find a lot of material for Optimal (or “optimizing”) Data Analysis (ODA) which is described as: In the Optimal (or “optimizing”) Data Analysis (ODA) statistical paradigm, an optimization algorithm is first utilized to identify the model that explicitly maximizes predictive accuracy for the sample, and then the resulting […]

How can you evaluate a research paper?

Shea Levy writes: You ended a post from last month [i.e., Feb.] with the injunction to not take the fact of a paper’s publication or citation status as meaning anything, and instead that we should “read each paper on its own.” Unfortunately, while I can usually follow e.g. the criticisms of a paper you might […]

“A bug in fMRI software could invalidate 15 years of brain research”

About 50 people pointed me to this press release or the underlying PPNAS research article, “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates,” by Anders Eklund, Thomas Nichols, and Hans Knutsson, who write: Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated […]

Reminder: Instead of “confidence interval,” let’s say “uncertainty interval”

We had a vigorous discussion the other day on confusions involving the term “confidence interval,” what does it mean to have “95% confidence,” etc. This is as good a time as any for me to remind you that I prefer the term “uncertainty interval”. The uncertainty interval tells you how much uncertainty you have. That […]

Discussion on overfitting in cluster analysis

Ben Bolker wrote: It would be fantastic if you could suggest one or two starting points for the idea that/explanation why BIC should naturally fail to identify the number of clusters correctly in the cluster-analysis context. Bob Carpenter elaborated: Ben is finding that using BIC to select number of mixture components is selecting too many […]