Theoretical statistics is the theory of applied statistics: how to think about what we do

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

Working scientists and engineers commonly feel that philosophy is a waste of time. But theoretical and philosophical principles can guide practice, so it makes sense for us to keep our philosophical foundations up to date. Much of the history of statistics can be interpreted as a series of expansions and inclusions: formalizations of procedures and ideas which had been previously considered outside the bounds of formal statistics. In this talk we discuss several such episodes, including the successful (in my view) incorporations of hierarchical modeling and statistical graphics into Bayesian data analysis, and the bad ideas (in my view) of null hypothesis significance testing and attempts to compute the posterior probability of a model being true. I’ll discuss my own philosophy of statistics and also the holes in my current philosophical framework.

It’s 4:15pm Wed 1 Mar 2017, Science Center lecture hall D. Or maybe hall E. I’m not sure as the official announcement says Hall D in one place and Hall E in another. It’s definitely in the Science Center, though, and Halls D and E must not be far from each other!

And, yes, this is the talk that I was couldn’t give at Michigan because of snow at the airport.

**P.S.** Some things you might want to read ahead of time: this, this, and this.

Looks like a great topic.

Very interesting! Is there a link for a live stream or something of the sort?

Hall D is in the ground floor whereas Hall E is in the basement.

It would be appropriate for a talk on the philosophical foundations to be in the basement.

Indeed it would be! Was it actually in the basement [Hall E]?

I’m glad you promo’d this. I’m in a center and had heard nothing of it! I’ll be there.

The most recent email from the stats department clarifies that the talk is in Hall D.

Andrew:

“It’s 4:15pm Wed 1 Mar 2017, Science Center lecture hall D. Or maybe hall E. I’m not sure as the official announcement says Hall D in one place and Hall E in another.”

I find it ironic that a highly technical talk with its excruciating, subtle distinctions can elide (ground floor) “hall D” and (basement) “hall E.”

Andrew – In the interest of open science, will you be arranging for somebody to record your talk so that you can post it online? I think a great many people would find this helpful (including me!)

Along those lines, I practice this myself; I gave a talk (split into 2 weeks) about problems with statistical practice in psychology and ideas about how to fix it. My talks can be viewed at https://www.youtube.com/watch?v=JgZZkMJhPvI&list=PLvPJKAgYsyoKcGOCKEYT2GyzK0yLVXvzN as a presentation of the ideas in my working manuscript at https://osf.io/preprints/psyarxiv/hp53k/

If anybody takes the time to watch my videos or read my paper, please let me know if I’ve said anything incorrect–I’m currently a grad student and wish to learn more about these topics!

+1

The problem with p-values is not that they are treated dichotomously but, rather, that a p-value, by itself, contains no quantitative information concerning the truth or falsity of the null. In places, you come close to “inverting the conditional probability.”

Why don’t you mention single-subject designs as an alternative?

Glen,

I think you’re right as well. Both are problems: the p-value doesn’t speak to the truth or falsity of the null, AND people treat it dichotomously (“p < .05?! Yay, I can get published now, because I have evidence for the alternative hypothesis that I was trying to support!…")

I still have to go back over the paper and substantially re-write it for submission; I'll give more care to avoid inverting the conditional. I appreciate the input!

Regarding single-subject designs, are you referring to repeated measurements from a single participant? Or something more along the lines of qualitative research?

“Regarding single-subject designs, are you referring to repeated measurements from a single participant? Or something more along the lines of qualitative research?”

GS: The former. The term “single-subject design” is misleading, though, as it implies that there is necessarily only one subject in the study. There can be only one subject (and there often is in applied work – i.e., in applied behavior analysis) but generally speaking there are more than one. Three to six is common. One of my favorite experiments has two pigeons in it. My dissertation took more than two years to complete, has hundreds of thousands of data points in it, and used four pigeons. In the simplest “prototypical” laboratory experiment, some property of behavior is measured (often rate of response) frequently during (usually) daily experimental sessions. This procedure is carried out until the measurement shows no systematic trends (i.e., neither decreasing nor increasing, often judged as stable “by eye”). When this is accomplished, one has a very good idea of the range of variation that obtains under the circumstances. At this point, one may change one of the variables and repeat the measurement process as in the previous “phase.” If the stable state under these conditions have a range of variation different than the previous phase, one suspects that the manipulated variable is important in the control of the phenomenon. At this point, the original experimental conditions are reintroduced (in the simplest version of such experiments) and the data are allowed to stabilize again. If the measurement returns to former levels, it is generally taken as strong evidence that the manipulated variable is an important independent-variable. This is often referred to as an “ABA design” (i.e, the first set of experimental conditions constitutes the “A” phase, the second the “B” phase, followed by a return to the “A” phase). If the effect in question is unusual, or otherwise difficult to believe, one may repeat the sequence as often as one likes (i.e., ABABAB…). It is generally thought that the method was introduced by Claude Bernard (the “father of experimental medicine”). It was adopted by B. F. Skinner in the ‘30s and was the foundation for the first (and only) natural science of behavior, now called “behavior analysis” (Skinner called it “the experimental analysis of behavior”).

Regards,

Glen

Just wanted to say I really enjoyed the talk, the tone was quite engaging and the jokes were pretty funny. At some points it was a bit unclear whether you were being serious or flippant (e.g. the default prior of N(0,1)), but it was largely clear by the end of the talk. I particularly enjoyed the bit you explained your rationale for using a Gamma(0, 2.5) prior in that epidemiological study and what you said about Basyesians becoming more Bayesian and frequentists becoming more frequentist, and everyone striving to be both.

Had a question I didn’t think of at the time – you spoke about wanting to investigate the frequency properties of “Bayesian” methods. I don’t have a ton of experience with hierarchical models, etc. but based on some of the analysis I’ve seen on this blog/in papers, these models can be fairly tailored to the problem at hand, so how would one structure a study the frequency properties of the multi-level modelling framework? What would theoretical frequency guarantees even look like? Most of the stuff I see online about frequency properties of Bayesian methods deals with stuff like Bayes’ factors, which seem simpler to investigate.

I agree. The talk was quite engaging once you got past the is he, isn’t he being sardonic. A small correction: the 2008 default was Cauchy(0,2.5) I think.

Yes, you are right, of course. Rather silly typo – Gamma(0, 2.5) is obviously not what was used since it is nonsense.

Aftab, Michael:

Every word in my talk was sincere.

Young (or young at heart) statisticians likely could make a good career out of trying to understand you ;-)