André Ariew writes:

I’m a philosopher of science at the University of Missouri. I’m interested in leading a seminar on a variety of current topics with philosophical value, including problems with significance tests, the replication crisis, causation, correlation, randomized trials, etc. I’m hoping that you can point me in a good direction for accessible readings for the syllabus. Can you? While the course is at the graduate level, I don’t assume that my students are expert in the philosophy of science and likely don’t know what a p-value is (that’s the trouble—need to get people to understand these things). When I teach a course on inductive reasoning I typically assign Ian Hacking’s An Introduction to Probability and Inductive Logic. I’m familiar with the book and he’s a great historian and philosopher of science.

He’d like to do more:

Anything you might suggest would be greatly appreciated. I’ve always thought that issues like these are much more important to the philosophy of science than much of what passes as the standard corpus.

My response:

I’d start with the classic and very readable 2011 article by Simmons, Nelson, and Simonsohn, False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.

And follow up with my (subjective) historical overview from 2016, What has happened down here is the winds have changed.

You’ll want to assign at least one paper by Paul Meehl; here’s a link to a 1985 paper, and here’s a pointer to a paper from 1967, along with the question, “What happened? Why did it take us nearly 50 years to what Meehl was saying all along? This is what I want the intellectual history to help me understand,” and 137 comments in the discussion thread.

And I’ll also recommend my own three articles on the philosophy of statistics:

- [2017] Beyond subjective and objective in statistics (with discussion). {\em Journal of the Royal Statistical Society}. (Andrew Gelman and Christian Hennig)
- [2013] Philosophy and the practice of Bayesian statistics (with discussion). {\em British Journal of Mathematical and Statistical Psychology} {\bf 66}, 8–38. (Andrew Gelman and Cosma Shalizi)

[2013] Rejoinder to discussion. {\em British Journal of Mathematical and Statistical Psychology} {\bf 66}, 76–80. (Andrew Gelman and Cosma Shalizi) - [2011] Induction and deduction in Bayesian data analysis. {\em Rationality, Markets and Morals}, special topic issue “Statistical Science and Philosophy of Science: Where Do (Should) They Meet In 2011 and Beyond?”, ed.\ Deborah Mayo, Aris Spanos, and Kent Staley. (Andrew Gelman)

The last of these is the shortest so it might be a good place to start—or the only one, since it would be overkill to ask people to read all three.

Regarding p-values etc., the following article could be helpful (sorry, it’s another one of mine!):

- [2017] Some natural solutions to the p-value communication problem—and why they won’t work. {\em Journal of the American Statistical Association}. (Andrew Gelman and John Carlin)

And, for causation, I recommend these two articles, both of which should be readable for students without technical backgrounds:

- [2011] Experimental reasoning in social science. In {\em Field Experiments and their Critics}, ed.\ Dawn Teele. Yale University Press. (Andrew Gelman)
- [2011] Causality and statistical learning. {\em American Journal of Sociology}. (Andrew Gelman)

OK, that’ll get you started. Perhaps the commenters have further suggestions?

**P.S.** I’d love to lead a seminar on the philosophy of statistics, unfortunately I suspect that here at Columbia this would attract approximately 0 students. I do cover some of these issues in my class on Communicating Data and Statistics, though.

Andre: I’ve taught philstat around 16 times in many different ways (sometimes jointly with A. Spanos)–syllabi can be found on my blog errorstatistics.com. I’ve often used Barnett, Comparative Statistical Inference (1982), many original readings of statisticians and philosophers (e.g., Hacking, Popper, Peirce, and my own Error and the Growth of Experimental Knowledge (1996), Mayo and Spanos (eds and authors) Error and Inference (2010), and nowadays my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (in press stage with CUP).

Andrew: I’ll bet if we taught it together, there’d be lots of interested students! Maybe next year.

I found Barnett very helpful. I also found it very expensive. $300 for the 3rd edition, $189 “discounted” on Amazon.

interwebs have an electronic copy

Thanks for reference to your blog, Deborah. Excellent resource.

For historical context on publication bias, I like Theodore Sterling’s 1959 Publication Decisions and Their Possible Effects on Inferences Drawn from Tests of Significance

–Or Vice Versa, published in J Am Stat Assoc.

“One thing is clear, however. The author’s stated risk cannot be accepted at its face value once the author’s conclusions appear in print.” In other words, the experimenter does not need to condition on publication when she updates her beliefs — but the reader does need to condition thusly.

Going back another 32 years, this very short letter is going around Twitter right now: http://jamanetwork.com/journals/jama/article-abstract/244730

Cheers,

Carl

Statistical Models and Shoeleather.

http://www.sas.rochester.edu/psc/clarke/405/Freedman91.pdf

Unlike most contributions to the “be wary of statistical significance arguments”, this one is based in observational thinking and not experimental thinking. And since much (¿most?) social science is and will remain observational, I think it is an important bridge between the purely mathematical p-value-based critiques of Ioannidis and the P-curve folks, on the one head, and the kind of papers people interested in social science will actually read on the other.

I think that one of the most important philosophical concepts in causal inference in social science at the moment is probably the idea of quasi-experimental variation that is “as good as random”. And I think that the Shoeleather paper does a good job of linking the “as good as random” idea to actual research. Beyond that, I suspect that there are real contributions to be made by philosophers linking the concept of “as good as random” to causal inference (maybe I just suspect that we are sweeping more under the rug than we think).

Somewhat less rigorous but no less caustic as a critique of the pitfalls of multiple regression is Richard Nisbet’s Edge.org interview, The Crusade Against Multiple Regression Analysis. https://www.edge.org/conversation/richard_nisbett-the-crusade-against-multiple-regression-analysis

Seconding this thread, Paul Meehl’s 1978 paper, “Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology,” Journal of Consulting and Clinical Psychology, 1978, Vol. 46, 806-834, foresaw many of the issues bedeviling the field today including publishing biases that favor papers reporting “significant” results, p-hacking, etc.

At the recent Society for the Improvement of Psychological Science (SIPS) meeting, one of the hackathon groups worked on creating modular content for graduate research methods syllabi. A number of modules touch on these topics:

https://osf.io/zbwr4/wiki/home/

@Andrew and @Mayo: I’d sign up for a seminar on the philosophy of statistics!

+1!

As would I!

Vapnik’s ‘The nature of statistical learning’ is an interesting read and has a chapter on Popper/falsifiability and connections to VC dimension etc.

Articles to stimulate discussion:

Brieman’s ‘Two cultures’ paper

Dawid’s ‘Beware of the DAG’

i before e except after B

I know Andre’s class is inevitably different, but my vote is to try to get as much of this sort of thing into the normal stats course as possible rather than isolated into an independent course. (I know that’s already a popular opinion around here).

I like the idea of including some modern resources on philstats along in the context of the usual epistemology, ontology, etc. Trying to teach p-values in that context (just to smack them down?) sounds tricky, though. For instance, I quite like Gelman/Carlin paper about p-value communication, but I’m not sure it would mean much to me if I had _just_ learned what a p-value even was. Maybe more time on introduction and discussion of the relation of probability distributions and “reality” would be more effective.

+1

There’s plenty of room for philosophy in the traditional intro stat course, especially once you’ve removed the tedious and pointless bits, such as calculating everything by hand, looking up numbers on a table, and distinguishing between (for example) 1.96 standard errors and 2.04 standard errors.

I honestly think if you taught an intro stats course that was about 80% philosophy and 20% using R or Octave or whatever to do simulations and plots and a little linear regression, you’d be doing the world a huge favor. The reason things go wrong in stats is typically application of ‘standard techniques’ without any thought on the question of meaning and appropriateness, which are both philosophical issues.

Meaning and appropriateness are core scientific issues!

http://existentialcomics.com/comic/196

“By the end of my 12-week program, you won’t know a single G-d Damned Thing!” – Socrates

Daniel: The main work on probabilistic inference in philosophy by Howson and Urbach (several additions)–subjective Bayesians–badly misinterprets all of error statistics, yet they have deeply influenced at least a generation of philosophers. We’ve had a number of critical exchanges, some published (e.g., 1997): http://www.phil.vt.edu/dmayo/personal_website/(1997)%20Error%20Statistics%20and%20Learning%20from%20Error%20Making%20a%20Virtue%20of%20Necessity.pdf

Shameless but entirely relevant plug: http://www.denisecummins.com/good-thinking.html

I would second the recommendations for Hacking. Jaynes’ “probability theory: the logic of science” is also great:

http://cdn.preterhuman.net/texts/science_and_technology/Probability%20Theory%20-%20The%20Logic%20Of%20Science.pdf

For history, Polya’s early work is interesting:

https://archive.org/details/Induction_And_Analogy_In_Mathematics_1_

https://archive.org/details/Patterns_Of_Plausible_Inference_2_

I really like Ed Leamer’s “Let’s Take the Con Out of Econometrics”:

https://www.jstor.org/stable/pdf/1803924.pdf

He talks about economics but the ideas translate broadly. There are some technical parts that require a basic familiarity with regression, but these are couched in real world examples. He also makes some claims about the relationship between “fact” and “opinion” that plenty of readers will take issue with – but that’s a good thing for a philosophy class.

I think a philosophy course might be a great place to address the different frameworks of Neyman-Pearson, Fisher, and Bayes. You could get pretty far working through the different questions each was constructed for and how that affects the design and interpretation of the procedures. I think you could do it without getting too deep into the technical details. The arguments among the parties were/are largely philosophical and have a nice historical bent, so it’s fitting for the course. For me, just the mere fact that “classical frequentist” was more than one thing was a revelation.

My primary guide to the Neyman-Pearson/Fisher split is Michael Lew (e.g. nice commentary and links to approachable articles here: https://stats.stackexchange.com/a/4567/10506). Deborah Mayo, Jaynes, and Richard Morey et al (https://learnbayes.org/papers/confidenceIntervalsFallacy/CItheory.html), also come to mind as interesting sources.

I put together a sample syllabus on the replication crisis for some job applications last fall. It’s available here: https://docs.google.com/document/d/1aTjJ8cnQcKZ7g0Zlu2INc4q-29gqgtXtXcMfU55vl10/edit?usp=sharing

This is great. Thanks for sharing.

Not an article, but an excellent introduction to the topic of p-hacking:

https://xkcd.com/882/

Except for the fact that the last panel repeats the inverse probability fallacy, that cartoon should be taught in every high school.

And for more fun, follow up with the simulation at http://www.jerrydallal.com/LHSP/jellybean.htm.

The Open Science Collaboration has a small database of course syllabi on these topics. You might find something useful here by looking at the reading lists on these course outlines:

https://osf.io/vkhbt/

Thanks everyone here for their suggestions. One excellent question that keeps appearing in the comments is what constitutes a philosophy of stat course? Slightly different question is what role do these issues in statistics and inductive inference play in a philosophy of science course? Traditional philosophy of science courses cover things like realism and anti-realism, explanation, laws, causation, demarcation between science and non-science, confirmation, to name a few. One approach is to supplement the readings for each topic with a paper in statistics. Another is to see how these issues are handled within the field of statistics. When I teach causation, for example, typical papers are: Hume, counterfactual accounts (Lewis), maybe flow of energy accounts (Fair), the pragmatics of explanation (van Frassen), maybe a manipulation approach (Woodward). Point is, I wouldn’t typically teach the methods of regression or correlation or any of the other topics you all have cited. Part of the reason the philosophy of scientific causation doesn’t bother much with statistical techniques is that philosophers are interested (in part) on the metaphysics of causation. But, there’s more to philosophy of science than the metaphysical questions. There’s methodological questions; there are issues concerning evaluating scientific practice (which I suppose the replication crisis fits under).

Andre, you may (or may not) be interested in my comments on Keith O’Rourke’s latest post here, in which I say that a lot of what I read at this blog leads me to think that Data Analysis or Data Science should be an undergraduate major. Your comment is another example, as you describe how neither stats nor philosophy courses get to all the way to the core of the methodological issues here.

But as they say, once you notice your own confirmation bias, you start seeing it everywhere. :-)

[All: this will be my last post on this hobbyhorse.]

It’s a good point that what fits in a philosophy of stats course and the role of stats in a philosophy of science course are different, but related, questions. For what it’s worth, my suggestion about examining the various statistical frameworks was intended for a stats extension to a standard phil. sci. course, because I think its the main place where many of the (to me) interesting philosophical issues meet practical methodology. I don’t see that the replication crisis itself has a lot of interesting philosophical content, except as it may ultimately lead to the larger questions about statistical frameworks and their underlying philosophical grants.

I’m not totally sure if, when you say you “wouldn’t typically teach the methods…”, you mean that you haven’t traditionally but plan to or won’t in any case. I took your original post to mean that you would need to work through the logic of some methods, even while avoiding the procedural and math bits.

Malcolm Forster and Elliott Sober are philosophers of science who apply statistical ideas (they talk about AIC a lot) to phil sci topics like realism, parsimony, unification, and Bayesianism. See Forster and Sober’s “How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions”, or Sober’s recent book “Ockham’s Razors”.

If you do any philosophy of probability, Alan Hajek is great. I think you can do better than Jaynes – philosophers (including Hajek, Forster, and Sober) have more nuanced views on the subjects Jaynes tackles. (Though MaxEnt seems worth knowing about.)

Maybe some de Finetti.

This fall, my boy Aubrey is teaching a Jaynes class:

https://www.extension.harvard.edu/academics/courses/inquiries-probability-statistics/15472

There exists an online option. If you click through, his syllabus does not (yet?) list the readings, but I would imagine that will change.

Meehl: “I am making a claim much stronger than that, which is I suppose the main reason that students and colleagues have trouble hearing it, since they might not know what to do next if they took it seriously!”

I think you wrote about it on this blog a while back but I really like this article by Stanley Klein, which seriously engages with the philosophy of science

http://journals.sagepub.com/doi/abs/10.1177/0959354314529616

Great book/article resources posted. Thank you.

Andrew

I think the book by Westfall and Young (1983) on Resampling based methods for multiple testing is an excellent starting point.

http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471557617.html

Aside from the many problems they cite with the incorrect applications of p-values, they give a number of humorous references including one letter to the editor of The Lancet on the Munchhausen framework for making all tests significant (which fits in exactly with the XKCD cartoon on significance).

W-Y is also a very well cited work in the econometrics and finance literature with the papers by White, by Romano-Wolf, and by Hansen among other (including Campbell Harvey in his various addresses, equating the problem in science to finding “Jesus on toast” apophanies). These are serious attempts to use bootstrap to estimate correlations in order to make more powerful corrections to multiple tests than the standard Bonferroni, Holm and Benjamini et (BHY) adjustments. This is one of the several strands of literature that attempts to make reasoned corrections to the standard, overly-abused p-value framework.