Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

“Data sleaze: Uber and beyond”

Interesting discussion from Kaiser Fung. I don’t have anything to add here; it’s just a good statistics topic. Scroll through Kaiser’s blog for more: Dispute over analysis of school quality and home prices shows social science is hard My pre-existing United boycott, and some musing on randomness and fairness etc.

Using prior knowledge in frequentist tests

Christian Bartels send along this paper, which he described as an attempt to use informative priors for frequentist test statistics. I replied: I’ve not tried to follow the details but this reminds me of our paper on posterior predictive checks. People think of this as very Bayesian but my original idea when doing this research […]

The next Lancet retraction? [“Subcortical brain volume differences in participants with attention deficit hyperactivity disorder in children and adults”]

[cat picture] Someone who prefers to remain anonymous asks for my thoughts on this post by Michael Corrigan and Robert Whitaker, “Lancet Psychiatry Needs to Retract the ADHD-Enigma Study: Authors’ conclusion that individuals with ADHD have smaller brains is belied by their own data,” which begins: Lancet Psychiatry, a UK-based medical journal, recently published a […]

Teaching Statistics: A Bag of Tricks (second edition)

Hey! Deb Nolan and I finished the second edition of our book, Teaching Statistics: A Bag of Tricks. You can pre-order it here. I love love love this book. As William Goldman would say, it’s the “good parts version”: all the fun stuff without the standard boring examples (counting colors of M&M’s, etc.). Great stuff […]

My proposal for JASA: “Journal” = review reports + editors’ recommendations + links to the original paper and updates + post-publication comments

[cat picture] Whenever they’ve asked me to edit a statistics journal, I say no thank you because I think I can make more of a contribution through this blog. I’ve said no enough times that they’ve stopped asking me. But I’ve had an idea for awhile and now I want to do it. I think […]

My talk this Friday in the Machine Learning in Finance workshop

[cat picture] This is kinda weird because I don’t know anything about machine learning in finance. I guess the assumption is that statistical ideas are not domain specific. Anyway, here it is: What can we learn from data? Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University The standard framework for statistical […]

The Efron transition? And the wit and wisdom of our statistical elders

[cat picture] Stephen Martin writes: Brad Efron seems to have transitioned from “Bayes just isn’t as practical” to “Bayes can be useful, but EB is easier” to “Yes, Bayes should be used in the modern day” pretty continuously across three decades. http://www2.stat.duke.edu/courses/Spring10/sta122/Handouts/EfronWhyEveryone.pdf http://projecteuclid.org/download/pdf_1/euclid.ss/1028905930 http://statweb.stanford.edu/~ckirby/brad/other/2009Future.pdf Also, Lindley’s comment in the first article is just GOLD: “The […]

Beyond subjective and objective in statistics: my talk with Christian Hennig tomorrow (Wed) 5pm in London

Christian Hennig and I write: Decisions in statistical data analysis are often justified, criticized, or avoided using concepts of objectivity and subjectivity. We argue that the words “objective” and “subjective” in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity […]

Probability and Statistics in the Study of Voting and Public Opinion (my talk at the Columbia Applied Probability and Risk seminar, 30 Mar at 1pm)

Probability and Statistics in the Study of Voting and Public Opinion Elections have both uncertainty and variation and hence represent a natural application of probability theory. In addition, opinion polling is a classic statistics problem and is featured in just about every course on the topic. But many common intuitions about probability, statistics, and voting […]

Some natural solutions to the p-value communication problem—and why they won’t work

Blake McShane and David Gal recently wrote two articles (“Blinding us to the obvious? The effect of statistical training on the evaluation of evidence” and “Statistical significance and the dichotomization of evidence”) on the misunderstandings of p-values that are common even among supposed experts in statistics and applied social research. The key misconception has nothing […]

Lady in the Mirror

In the context of a report from a drug study, Stephen Senn writes: The bare facts they established are the following: The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by […]

“Beyond Heterogeneity of Effect Sizes”

[cat picture] Piers Steel writes: One of the primary benefits of meta-analytic syntheses of research findings is that researchers are provided with an estimate of the heterogeneity of effect sizes. . . . Low values for this estimate are typically interpreted as indicating that the strength of an effect generalizes across situations . . . […]

How is preregistration like random sampling and controlled experimentation

image In the discussion following my talk yesterday, someone asked about preregistration and I gave an answer that I really liked, something I’d never thought of before. I started with my usual story that preregistration is great in two settings: (a) replicating your own exploratory work (as in the 50 shades of gray paper), and […]

How to do a descriptive analysis using regression modeling?

Freddy Garcia writes: I read your post Vine regression?, and your phrase “I love descriptive data analysis!” make me wonder: How to do a descriptive analysis using regression models? Maybe my question could be misleading to an statistician, but I am a economics student. So we are accustomed to think in causal terms when we […]

Advice when debugging at 11pm

Add one feature to your model and test and debug with fake data before going on. Don’t try to add two features at once.

Checkmate

Sandro Ambuehl writes: As an avid reader of your blog, I thought you might like (to hate) the attached PNAS paper with the following findings: (i) sending two flyers about the importance of STEM fields to the parents of 81 kids improves ACT scores by 12 percentile points (intent-to-treat effect… a bit large, perhaps?) and […]

Yes, it makes sense to do design analysis (“power calculations”) after the data have been collected

This one has come up before but it’s worth a reminder. Stephen Senn is a thoughtful statistician and I generally agree with his advice but I think he was kinda wrong on this one. Wrong in an interesting way. Senn’s article is from 2002 and it is called “Power is indeed irrelevant in interpreting completed […]

Theoretical statistics is the theory of applied statistics: how to think about what we do (My talk Wednesday—today!—4:15pm at the Harvard statistics dept)

Theoretical statistics is the theory of applied statistics: how to think about what we do Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University Working scientists and engineers commonly feel that philosophy is a waste of time. But theoretical and philosophical principles can guide practice, so it makes sense for us to […]

Ethics and the Replication Crisis and Science (my talk Tues 6pm)

I’ll be speaking on Ethics and the Replication Crisis and Science tomorrow (Tues 28 Feb) 6-7:30pm at room 411 Fayerweather Hall, Columbia University. I don’t plan to speak for 90 minutes; I assume there will be lots of time for discussion. Here’s the abstract that I whipped up: Busy scientists sometimes view ethics and philosophy […]

Forecasting mean and sd of time series

Garrett M. writes: I had two (hopefully straightforward) questions related to time series analysis that I was hoping I could get your thoughts on: First, much of the work I do involves “backtesting” investment strategies, where I simulate the performance of an investment portfolio using historical data on returns. The primary summary statistics I generate […]