Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

To know the past, one must first know the future: The relevance of decision-based thinking to statistical analysis

We can break up any statistical problem into three steps: 1. Design and data collection. 2. Data analysis. 3. Decision making. It’s well known that step 1 typically requires some thought of steps 2 and 3: It is only when you have a sense of what you will do with your data, that you can […]

Frank Harrell statistics blog!

Frank Harrell, author of an influential book on regression modeling and currently both a biostatistics professor and a statistician at the Food and Drug Administration, has started a blog. He sums up “some of his personal philosophy of statistics” here: Statistics needs to be fully integrated into research; experimental design is all important Don’t be […]

Problems with “incremental validity” or more generally in interpreting more than one regression coefficient at a time

Kevin Lewis points us to this interesting paper by Jacob Westfall and Tal Yarkoni entitled, “Statistically Controlling for Confounding Constructs Is Harder than You Think.” Westfall and Yarkoni write: A common goal of statistical analysis in the social sciences is to draw inferences about the relative contributions of different variables to some outcome variable. When […]

A small, underpowered treasure trove?

Benjamin Kirkup writes: As you sometimes comment on such things; I’m forwarding you a journal editorial (in a society journal) that presents “lessons learned” from an associated research study. What caught my attention was the comment on the “notorious” design, the lack of “significant” results, and the “interesting data on nonsignificant associations.” Apparently, the work […]

When do stories work, Process tracing, and Connections between qualitative and quantitative research

Jonathan Stray writes: I read your “when do stories work” paper (with Thomas Basbøll) with interest—as a journalist stories are of course central to my field. I wondered if you had encountered the “process tracing” literature in political science? It attempts to make sense of stories as “case studies” and there’s a nice logic of […]

We fiddle while Rome burns: p-value edition

Raghu Parthasarathy presents a wonderfully clear example of disastrous p-value-based reasoning that he saw in a conference presentation. Here’s Raghu: Consider, for example, some tumorous cells that we can treat with drugs 1 and 2, either alone or in combination. We can make measurements of growth under our various drug treatment conditions. Suppose our measurements […]

“Which curve fitting model should I use?”

Oswaldo Melo writes: I have learned many of curve fitting models in the past, including their technical and mathematical details. Now I have been working on real-world problems and I face a great shortcoming: which method to use. As an example, I have to predict the demand of a product. I have a time series […]

When you add a predictor the model changes so it makes sense that the coefficients change too.

Shane Littrell writes: I’ve recently graduated with my Masters in Science in Research Psych but I’m currently trying to get better at my stats knowledge (in psychology, we tend to learn a dumbed down, “Stats for Dummies” version of things). I’ve been reading about “suppressor effects” in regression recently and it got me curious about […]

Field Experiments and Their Critics

Seven years ago I was contacted by Dawn Teele, who was then a graduate student and is now a professor of political science, and asked for my comments on an edited book she was preparing on social science experiments and their critics. I responded as follows: This is a great idea for a project. My […]

Fragility index is too fragile

Simon Gates writes: Where is an issue that has had a lot of publicity and Twittering in the clinical trials world recently. Many people are promoting the use of the “fragility index” (paper attached) to help interpretation of “significant” results from clinical trials. The idea is that it gives a measure of how robust the […]

Two unrelated topics in one post: (1) Teaching useful algebra classes, and (2) doing more careful psychological measurements

Kevin Lewis and Paul Alper send me so much material, I think they need their own blogs. In the meantime, I keep posting the stuff they send me, as part of my desperate effort to empty my inbox. 1. From Lewis: “Should Students Assessed as Needing Remedial Mathematics Take College-Level Quantitative Courses Instead? A Randomized […]

“The Pitfall of Experimenting on the Web: How Unattended Selective Attrition Leads to Surprising (Yet False) Research Conclusions”

Kevin Lewis points us to this paper by Haotian Zhou and Ayelet Fishbach, which begins: The authors find that experimental studies using online samples (e.g., MTurk) often violate the assumption of random assignment, because participant attrition—quitting a study before completing it and getting paid—is not only prevalent, but also varies systemically across experimental conditions. Using […]

Ethics and statistics

For a few years now, I’ve been writing a column in Chance. Below are the articles so far. This is by no means an exhaustive list of my writings on ethics and statistics but at least I thought it could help to collect these columns in one place. Ethics and statistics: Open data and open […]

Christmas special: Survey research, network sampling, and Charles Dickens’ coincidences

It’s Christmas so what better time to write about Charles Dickens . . . Here’s the story: In traditional survey research we have been spoiled. If you work with atomistic data structures, a small sample looks like a little bit of the population. But a small sample of a network doesn’t look like the whole. […]

Steve Fienberg

I did not know Steve Fienberg well, but I met him several times and encountered his work on various occasions, which makes sense considering his research area was statistical modeling as applied to social science. Fienberg’s most influential work must have been his books on the analysis of categorical data, work that was ahead of […]

Low correlation of predictions and outcomes is no evidence against hot hand

Josh Miller (of Miller & Sanjurjo) writes: On correlations, you know, the original Gilovich, Vallone, and Tversky paper found that the Cornell players’ “predictions” of their teammates’ shots correlated 0.04, on average. No evidence they can see the hot hand, right? Here is an easy correlation question: suppose Bob shoots with probability ph=.55 when he […]

How can time series information be used to choose a control group?

This post is by Phil Price, not Andrew. Before I get to my question, you need some background. The amount of electricity that is provided by an electric utility at a given time is called the “electric load”, and the time series of electric load is called the “load shape.” Figure 1 (which is labeled […]

Applying statistical thinking to the search for extraterrestrial intelligence

Thomas Basbøll writes: A statistical question has been bugging me lately. I recently heard that Yuti Milner has donated 100 millions dollars to 10-year search for extraterrestrial intelligence. I’m not very practiced in working out probability functions but I thought maybe you or your readers would find it easy and fun to do this. Here’s […]

The social world is (in many ways) continuous but people’s mental models of the world are Boolean

Raghu Parthasarathy points me to this post and writes: I wrote after seeing one too many talks in which someone bases boolean statements about effects “existing” or “not existing” (infuriating in itself) based on “p < 0.05” or “p > 0.5”. Of course, you’ve written tons of great things on the pitfalls, errors, and general […]

How to think about the p-value from a randomized test?

Roahn Wynart asks: Scenario: I collect a lot of data for a complex psychology experiment. I put all the raw data into a computer. I program the computer to do 100 statistical tests. I assign each statistical test to a key on my keyboard. However, I do NOT execute the statistical test. Each key will […]