I’m speaking at the Electronic Conference on Teaching Statistics on Mon 16 May at 11am.

I’ve given many remote talks but this is the first time I’ve spoken at an all-electronic conference. It will be a challenge. In a live talk, everyone’s just sitting in the room staring at you, but in an electronic conference everyone will be reading their email and surfing the web. So the bar for “replacement level” (as they say in baseball) is a lot higher.

At the very least, I have to be more lively than my own writing, or people will just tune me out and start reading old blog entries.

Here’s my title and abstract:

Changing everything at once: Student-centered learning, computerized practice exercises, evaluation of student progress, and a modern syllabus to create a completely new introductory statistics course

It should be possible to improve the much-despised introductory statistics course in several ways: (1) altering the classroom experience toward active learning, (2) using adaptive software to drill students with questions at their level, repeating until students attain proficiency in key skills, and (3) standardized pre-tests and post-tests, both for measuring individual students’ progress and for comparing the effectiveness of different instructors and different teaching strategies. All these ideas are well established in the education literature but do not seem to be part of the usual statistics course. We would like to implement all these changes in the context of (4) a restructuring of the course content, replacing hypothesis testing, p-values, and the notorious “sampling distribution of the sample mean” with ideas closer to what we see as good statistical practice. We will discuss our struggles in this endeavor. This work is joint with Eric Loken.

I’m planning to start as follows:

An important characteristic of a good scientist is the capacity to be *upset*, to recognize anomalies for what they are, and to track them down and figure out what in our understanding is lacking. This sort of unsettled-ness—an unwillingness to sweep concerns under the rug, a scrupulousness about acknowledging one’s uncertainty—is, I would argue, particularly important for a statistician.

For a teacher, maybe not. Some of the most effective teachers I’ve known do not push and push; rather, they have a clean understanding of the world which they can convey to their students.

What about researchers like myself who also teach? What about textbook writers? We need to walk the line, to present a clear structure for students to learn, while acknowledging the dragons that lurk just outside the borders of our well-mapped territory. And this balance is particularly difficult in statistics, a practice which is full of approximations and judicious choices which cannot easily be codified.

As I said, I strongly believe that each of you needs to cultivate your capacity for being upset. And, toward this goal, I’d like to begin today’s talk by upsetting as many of you as possible.

I have soooo many different ways to upset you. I’d like to upset you from all directions. I could upset you with a simple example demonstrating the serious, serious failings of textbook Bayesian inference (and, yes, I include our textbook here as one of the failures). I could upset you by reminding you that we preach the virtues of controlled experimentation yet do not follow any such protocol when evaluating any aspects of our own work. Or I could upset you by arguing—convincingly, I think—that many of the worst misconceptions of statistical practitioners arise not from a lack of statistical education but because the field of statistics has conveyed some of its pernicious messages all too well. But today I’ll do my best to upset you in another way . . .

Andrew writes,

“I have soooo many different ways to upset you. I’d like to upset you from all directions.”

One way to upset me temporally today, June 6, 2016 is to begin with

“I’m speaking at the Electronic Conference on Teaching Statistics on Mon 16 May at 11am.”

An invitation to something which has already occurred is disturbing.

And I’m upset because I want to know more about 1–4, but I hate talks. Talking has such a poor bandwidth (in information/sec). Text is vastly superior.

Youtube at 1.5x or 2x speed improves the bandwidth.

There is a recorded version online, which is nice: https://www.causeweb.org/cause/ecots/ecots16/keynotes/gelman

A recording is available at

https://www.causeweb.org/cause/ecots/ecots16/keynotes/gelman

(I haven’t had time to listen to it yet)

Today is D-Day…in more ways than one…

Why then do suppose the teaching profession relies so heavily upon the classroom/lecture “Talking” method?

Do students generally revere that “Talking” mode of top-down learning?

(And H.L. Mencken had an unusual opinion of professional lecturing:

“I never lecture … not because I am shy or a bad speaker, but simply because I detest the sort of people who go to lectures– and don’t want to meet them.” )

– H.L. Mencken

Which teaching profession?

Perloff:

You should ask the organizers of the conference; they’re the ones who asked me to talk!

Andrew mentioned that perhaps Intro Stats classes try to cover too much ground and I think that’s right. Teachers always say that they’d be happy if students just learn a few things well… but often teach a slew of things with little depth or time to practice.

I disagree, however, that the Intro Physics material is harder — statistical inference is much subtler and more difficult for people to wrap their heads around than Newton’s laws (I say this as a high school Physics/Chemistry/Stats teacher who has taught some of the same students across subjects). Maybe inference should be left for a 3rd or even 4th course in statistics. It might be the dessert but does anyone with only one or two stats courses stand any chance of doing it well?

With the excuse that I may not know either what statisticians value or what social scientists need, what about a sequence like:

1. Data Analysis (plotting, summarizing and wrangling data in R with some stats content: measures of center/spread/covariation, best-fit lines, regression towards the mean)

2. Probability, Radom Variables and Markov Chains

3. Regression Models

4. Bayesian Data Analysis (I’m thinking about McElreath’s Statistical Rethinking here)

The Intro course, Data Analysis, just has kids trying to make sense of data without much theory (although maybe they could use resampling to get a sense of uncertainty).

I really agree with this. The temptation is to squeeze in one more thing. But it’s better to just get students used to thinking about data and working with data, graphing data and that is it.

Lots of good points here. Thanks.

I like this progression too, but replacing (2) with “probability and simulation”. The class would basically be learning about uncertainty in estimates by simulating little experiments and comparisons. If you start simply you can do basic probability this way too. I think that actually going through the process of generating data, analyzing it, and doing it again and again is a better way to get students to understand uncertainty than having them memorize equations or learn a whole bunch of “tests”. Then in (3) you can combine that with the basics of regression, and that leads directly to bootstraps, randomization tests, and (lastly) to analytic solutions for standard errors and p-values.

But basically, you never “define” some abstract concept (like a sampling distribution) without showing them the thing it is capturing (histograms of BetaHat). And then you make them generate and display one.

jrc: You might wish to look at what David Spiegelhalter has done on this http://nrich.maths.org/probability

> you never “define” some abstract concept (like a sampling distribution) without showing them the thing it is capturing

I do think it is capturing a historical curiosity (the need to work with a very small number of numbers and hoping convenient probability assumptions give sufficiency or hoping one has made it to asymptopia ).

Makes no sense that you would not want to use every single individual data value when learning about uncertainty and in fact you have to if you understand you need to check the fit of the model. Move on and just show how posterior concentrates with increasing sample sizes (if not fully dependent) and if you must – point out that conditioning on just summary statistics tends to give you close to the same posterior as conditioning on the raw data.

One idea is to try out a strict separation of roles between the teacher & the tester.

I wonder how well or badly it’d work out if Andrew & Eric let some other third person design the pre- & post- assessments both with the test-designer only being told that this is an Intro Stats Course.

Naive question: When we speak of “standardized pre-tests” how exactly do we perform the standardization?

A comment on your mention of a discrete approach to Bayes – I believe it was Jim Albert at one early point even doing it in minitab.

I do think he was just doing calculation on grids.

If instead you do two stage rejection sampling (or naive ABC, draw parameter from prior, generate fake data and just keep parameters where fake data equaled actual data to get posterior sample) the route to continuity is not too indirect and seems to work well.

Start with discrete data but continuous parameter space and a small sample size (e.g. binomial example).

First note, in rejection sampling the percentage of a particular value kept is an estimate of the likelihood – p(x|parameter value).

Then note, the posterior is simply the prior re-weighted by these estimated percentages.

If you have a formula for the likelihood you can skip the generation of fake data and simply re-weight the prior (sampled) by the likelihood (aka importance sampling).

Show the challenge of sampling from posteriors by showing how the simple importance sampling works for small sample but will eventual fail for large sample sizes (posterior to far away from the prior sample you drew.)

Now do sequential importance sampling – i.e. divide the data up into small sequential samples and walk through prior -> posterior.1 -> posterior.2 -> posterior.2 or use the nth roots of the likelihood.

Sequential importance sampling allows a lot of Bayesian analyses to be done – so know do a few real problems.

Finally describe MCMC as another way of walking to the posterior to sample from it, admit you can always miss it and not realize it (and say that is material for a more advanced course.)

Start redoing the previously done examples in Stan.

McElreath’s Statistical Rethinking does conceptually use the two stage sampling approach (the tracing out of all the ways something that did happen could have happened given a joint model to get posterior probabilities) but it was done in a full term graduate course with the use of great metaphors like small/big worlds, Gollems, and my favorite one – the wall of China. So its going to be a lot of work to get anything that does work!

I agree with all of this. The only major things I would add is to train in situ. Have them work through problems researchers had to solve from real world studies when learning these concepts. Biology and Chemistry are good generic fields to pull research from.

I personally like the idea of going overboard on the stats in Physics lab. Measurement technique is ultimately about how much is left uncertain after you’ve taken the measurements. Often physics labs are set up so that uncertainty is really small, and people learn a first-order independent errors, second-moment approximation, but uncertainty could be more moderate sized and actually make things more interesting and more realistic.

Here are some possible example experiments:

1) Gravitational acceleration. Make a PVC tube with 5 photo-interrupter detectors, an absolute pressure gauge, and a port for a vacuum pump. Place a solenoid activated ball dropping device at the top. Drop golf balls and pingpong balls at various pressures, infer from the data both gravitational acceleration and drag effects.

2) Temperature dependent friction properties. Slide different materials down an aluminum ramp which is held at different temperatures by heating coils behind the ramp. Determine not only a coefficient of friction, but how that coefficient changes at different temperatures.

3) Linear momentum: use a standard air-hockey table with a video camera above it to observe collisions between air hockey pucks with various unknown additional masses on them (coins or washers or similar small mass perturbations). Infer from positions at multiple time points simultaneously what the velocities, directions, and masses are for complex collisions (ie. one puck into a stationary puck, one puck into two stationary pucks. one puck into a stationary puck which then hits a second stationary puck…)

4) Mass on a spring: From observations of the position taken off a video camera, and measurements of the mass and of the spring, infer the dynamics of the system using differential equations, infer the spring constant of the spring, and include the mass of the spring itself in the dynamics.

5) Vibration of a piano wire: A real world piano wire is not infinitely thin. The wire-wrapped kind has a resistance to bending. Write down the differential equations assuming the non-zero thickness of the wire. Tension the wire to different tensions, measure the length, and the sound it produces, fourier transform it, and find the deviations from the ideal fourier spectrum, infer the bending resistance that is implied using a Bayesian model.

Walk the students through the process of modeling these simple systems with real-world effects like drag and slight inelasticity of collisions, and unknown masses, and measurement errors from taking positions off pixel coordinates on videos and etc etc. Have them actually code those effects into Stan programs and run Bayesian models that let them infer the unknowns with uncertainties. Let them run models in which they ignore certain effects and compare them to the models where the small effects aren’t ignored.

We can make things nearly perfectly idealized, and then teach them that we sort of “know everything” or we can intentionally not be perfectly ideal, and teach them that their ingenuity and modeling skills are what is needed to really understand how things actually work. Which would we rather do?

I am skeptical about “changing everything at once”–for many ongoing reasons, but here because I truly enjoy and learn from your lectures (the ones I have listened to online). The big potential loss of “student-centered learning” is that students do not yet have perspective on the subject. Of course they need lots of practice, but should that be the focus of class? Shouldn’t there be at least some lecture so that students can see how a statistician works with the subject?

Sure, there’s benefit in a combination of lecture and focused practice; but why “change everything at once”? There is simply too much good in the lecture, if the students know how to listen to it. (That’s the thing: listening doesn’t have to be passive at all. It’s easy to set yourself a challenge: for instance, when listening, come up with a relevant question that the lecture does *not* address and that could help clarify and extend the topic.)

You point out in your talk that students aren’t even getting the basics–but that may have to do with the way they study. They may not be in the practice of asking themselves, “Do I really understand this? Can I solve this kind of problem and explain each step as I go along?” That’s an essential kind of self-discipline; if students don’t have it, they can develop it. I have had my students explain why they don’t understand a homework question or assignment. The practice of identifying where the understanding breaks down has often helped them get a handle on the assignment.

“Active learning” in the classroom has its own pitfalls and drawbacks. For one, the room can get noisy, and it can be hard to think. For another, students who find the material challenging may need to ponder it on their own before solving problems with others. As a student, I generally liked (and still like) to take classes a bit above my level; this meant that in the first third or so of the course, I might not be able to answer questions on the spot. Later on, I could do so, because of the extra work I had put in and the fluency I had gained.

But beyond all this, I believe that to be a better teacher, one does not have to do everything differently. (“Change everything” is a current mantra in education, and it brings out my skepticism.) One finds the combination of old and new that suits the subject matter, course, students, and teacher. The lecture has much to offer; it just needs proper use and place, and students need to know how to listen to it.

+1

There are lots of things between the extremes of “pure lecture” and “in class group work” that can be helpful to students’ learning.

For example, one thing that I have found useful in teaching statistics is to give students, as homework, a list of questions to think about (e.g., a list of possible definitions of a concept to classify as “gets it,” “partly gets it,” or “doesn’t get it”), then give them a few minutes in class to go over their answers with someone sitting near them, then take a “class vote,” followed by choosing someone with each answer to justify their answer. This seems to help more students to understand complex ideas, without putting any one person too much on the spot. (If not too many students “vote”, then I also ask for a show of hands on the additional option, “I don’t have a clue.”)