What readings should be included in a seminar on the philosophy of statistics, the replication crisis, causation, etc.?

André Ariew writes:

I’m a philosopher of science at the University of Missouri. I’m interested in leading a seminar on a variety of current topics with philosophical value, including problems with significance tests, the replication crisis, causation, correlation, randomized trials, etc. I’m hoping that you can point me in a good direction for accessible readings for the syllabus. Can you? While the course is at the graduate level, I don’t assume that my students are expert in the philosophy of science and likely don’t know what a p-value is (that’s the trouble—need to get people to understand these things). When I teach a course on inductive reasoning I typically assign Ian Hacking’s An Introduction to Probability and Inductive Logic. I’m familiar with the book and he’s a great historian and philosopher of science.

He’d like to do more:

Anything you might suggest would be greatly appreciated. I’ve always thought that issues like these are much more important to the philosophy of science than much of what passes as the standard corpus.

My response:

I’d start with the classic and very readable 2011 article by Simmons, Nelson, and Simonsohn, False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.

And follow up with my (subjective) historical overview from 2016, What has happened down here is the winds have changed.

You’ll want to assign at least one paper by Paul Meehl; here’s a link to a 1985 paper, and here’s a pointer to a paper from 1967, along with the question, “What happened? Why did it take us nearly 50 years to what Meehl was saying all along? This is what I want the intellectual history to help me understand,” and 137 comments in the discussion thread.

And I’ll also recommend my own three articles on the philosophy of statistics:

The last of these is the shortest so it might be a good place to start—or the only one, since it would be overkill to ask people to read all three.

Regarding p-values etc., the following article could be helpful (sorry, it’s another one of mine!):

And, for causation, I recommend these two articles, both of which should be readable for students without technical backgrounds:

OK, that’ll get you started. Perhaps the commenters have further suggestions?

P.S. I’d love to lead a seminar on the philosophy of statistics, unfortunately I suspect that here at Columbia this would attract approximately 0 students. I do cover some of these issues in my class on Communicating Data and Statistics, though.

41 thoughts on “What readings should be included in a seminar on the philosophy of statistics, the replication crisis, causation, etc.?

  1. Andre: I’ve taught philstat around 16 times in many different ways (sometimes jointly with A. Spanos)–syllabi can be found on my blog errorstatistics.com. I’ve often used Barnett, Comparative Statistical Inference (1982), many original readings of statisticians and philosophers (e.g., Hacking, Popper, Peirce, and my own Error and the Growth of Experimental Knowledge (1996), Mayo and Spanos (eds and authors) Error and Inference (2010), and nowadays my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (in press stage with CUP).

    Andrew: I’ll bet if we taught it together, there’d be lots of interested students! Maybe next year.

  2. For historical context on publication bias, I like Theodore Sterling’s 1959 Publication Decisions and Their Possible Effects on Inferences Drawn from Tests of Significance
    –Or Vice Versa, published in J Am Stat Assoc.

    “One thing is clear, however. The author’s stated risk cannot be accepted at its face value once the author’s conclusions appear in print.” In other words, the experimenter does not need to condition on publication when she updates her beliefs — but the reader does need to condition thusly.

    Going back another 32 years, this very short letter is going around Twitter right now: http://jamanetwork.com/journals/jama/article-abstract/244730

    Cheers,
    Carl

  3. Statistical Models and Shoeleather.

    http://www.sas.rochester.edu/psc/clarke/405/Freedman91.pdf

    Unlike most contributions to the “be wary of statistical significance arguments”, this one is based in observational thinking and not experimental thinking. And since much (¿most?) social science is and will remain observational, I think it is an important bridge between the purely mathematical p-value-based critiques of Ioannidis and the P-curve folks, on the one head, and the kind of papers people interested in social science will actually read on the other.

    I think that one of the most important philosophical concepts in causal inference in social science at the moment is probably the idea of quasi-experimental variation that is “as good as random”. And I think that the Shoeleather paper does a good job of linking the “as good as random” idea to actual research. Beyond that, I suspect that there are real contributions to be made by philosophers linking the concept of “as good as random” to causal inference (maybe I just suspect that we are sweeping more under the rug than we think).

  4. I know Andre’s class is inevitably different, but my vote is to try to get as much of this sort of thing into the normal stats course as possible rather than isolated into an independent course. (I know that’s already a popular opinion around here).

    I like the idea of including some modern resources on philstats along in the context of the usual epistemology, ontology, etc. Trying to teach p-values in that context (just to smack them down?) sounds tricky, though. For instance, I quite like Gelman/Carlin paper about p-value communication, but I’m not sure it would mean much to me if I had _just_ learned what a p-value even was. Maybe more time on introduction and discussion of the relation of probability distributions and “reality” would be more effective.

  5. I really like Ed Leamer’s “Let’s Take the Con Out of Econometrics”:

    https://www.jstor.org/stable/pdf/1803924.pdf

    He talks about economics but the ideas translate broadly. There are some technical parts that require a basic familiarity with regression, but these are couched in real world examples. He also makes some claims about the relationship between “fact” and “opinion” that plenty of readers will take issue with – but that’s a good thing for a philosophy class.

  6. I think a philosophy course might be a great place to address the different frameworks of Neyman-Pearson, Fisher, and Bayes. You could get pretty far working through the different questions each was constructed for and how that affects the design and interpretation of the procedures. I think you could do it without getting too deep into the technical details. The arguments among the parties were/are largely philosophical and have a nice historical bent, so it’s fitting for the course. For me, just the mere fact that “classical frequentist” was more than one thing was a revelation.

    My primary guide to the Neyman-Pearson/Fisher split is Michael Lew (e.g. nice commentary and links to approachable articles here: https://stats.stackexchange.com/a/4567/10506). Deborah Mayo, Jaynes, and Richard Morey et al (https://learnbayes.org/papers/confidenceIntervalsFallacy/CItheory.html), also come to mind as interesting sources.

  7. Thanks everyone here for their suggestions. One excellent question that keeps appearing in the comments is what constitutes a philosophy of stat course? Slightly different question is what role do these issues in statistics and inductive inference play in a philosophy of science course? Traditional philosophy of science courses cover things like realism and anti-realism, explanation, laws, causation, demarcation between science and non-science, confirmation, to name a few. One approach is to supplement the readings for each topic with a paper in statistics. Another is to see how these issues are handled within the field of statistics. When I teach causation, for example, typical papers are: Hume, counterfactual accounts (Lewis), maybe flow of energy accounts (Fair), the pragmatics of explanation (van Frassen), maybe a manipulation approach (Woodward). Point is, I wouldn’t typically teach the methods of regression or correlation or any of the other topics you all have cited. Part of the reason the philosophy of scientific causation doesn’t bother much with statistical techniques is that philosophers are interested (in part) on the metaphysics of causation. But, there’s more to philosophy of science than the metaphysical questions. There’s methodological questions; there are issues concerning evaluating scientific practice (which I suppose the replication crisis fits under).

    • Andre, you may (or may not) be interested in my comments on Keith O’Rourke’s latest post here, in which I say that a lot of what I read at this blog leads me to think that Data Analysis or Data Science should be an undergraduate major. Your comment is another example, as you describe how neither stats nor philosophy courses get to all the way to the core of the methodological issues here.

      But as they say, once you notice your own confirmation bias, you start seeing it everywhere. :-)

      [All: this will be my last post on this hobbyhorse.]

    • It’s a good point that what fits in a philosophy of stats course and the role of stats in a philosophy of science course are different, but related, questions. For what it’s worth, my suggestion about examining the various statistical frameworks was intended for a stats extension to a standard phil. sci. course, because I think its the main place where many of the (to me) interesting philosophical issues meet practical methodology. I don’t see that the replication crisis itself has a lot of interesting philosophical content, except as it may ultimately lead to the larger questions about statistical frameworks and their underlying philosophical grants.

      I’m not totally sure if, when you say you “wouldn’t typically teach the methods…”, you mean that you haven’t traditionally but plan to or won’t in any case. I took your original post to mean that you would need to work through the logic of some methods, even while avoiding the procedural and math bits.

    • Malcolm Forster and Elliott Sober are philosophers of science who apply statistical ideas (they talk about AIC a lot) to phil sci topics like realism, parsimony, unification, and Bayesianism. See Forster and Sober’s “How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions”, or Sober’s recent book “Ockham’s Razors”.

      If you do any philosophy of probability, Alan Hajek is great. I think you can do better than Jaynes – philosophers (including Hajek, Forster, and Sober) have more nuanced views on the subjects Jaynes tackles. (Though MaxEnt seems worth knowing about.)

      Maybe some de Finetti.

  8. Meehl: “I am making a claim much stronger than that, which is I suppose the main reason that students and colleagues have trouble hearing it, since they might not know what to do next if they took it seriously!”

  9. Andrew
    I think the book by Westfall and Young (1983) on Resampling based methods for multiple testing is an excellent starting point.
    http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471557617.html
    Aside from the many problems they cite with the incorrect applications of p-values, they give a number of humorous references including one letter to the editor of The Lancet on the Munchhausen framework for making all tests significant (which fits in exactly with the XKCD cartoon on significance).

    W-Y is also a very well cited work in the econometrics and finance literature with the papers by White, by Romano-Wolf, and by Hansen among other (including Campbell Harvey in his various addresses, equating the problem in science to finding “Jesus on toast” apophanies). These are serious attempts to use bootstrap to estimate correlations in order to make more powerful corrections to multiple tests than the standard Bonferroni, Holm and Benjamini et (BHY) adjustments. This is one of the several strands of literature that attempts to make reasoned corrections to the standard, overly-abused p-value framework.

Leave a Reply to Thomas B Cancel reply

Your email address will not be published. Required fields are marked *