Thoughts on Teaching Regression

I recently finished my first semester of teaching. I was a TA in grad school, but this was my first time being “the professor.” I was teaching a regression course, and there are several things I’d like to do differently should I teach the same class again in the future. I just have to figure out how.

Teaching the first part of the course was the easiest: least squares, maximum likelihood, testing, lots of theory, and not a lot of data. Sounds dry and boring, I know, but I wouldn’t have become a statistician if I didn’t at least sort of like the theory and the mathematics underlying basic statistics. But more than that, the theoretical part of the regression course was the easiest to teach because it’s basically non-negotiable. Hypothesis tests and p-values might require a lot of explanation because the thinking is kind of counterintuitive, but the content is pretty cut-and-dry: If the data in a regression analysis meet the various assumptions (independence, normality, etc.), then the results on testing, confidence intervals, prediction, etc., follow. Done.

Then come the actual data: the interesting stuff, the examples, the reason we’re doing statistics in the first place (the reason I’m doing statistics, anyway). And it all gets a lot messier and harder to teach. Because real data don’t tend to fit all the assumptions that regression theory is based on. Residual plots aren’t always the patternless clouds of points shown as examples in textbooks. Normal probability plots of the residuals aren’t always 45-degree lines. Measurement error happens. Model selection is hard. I found myself saying things like “It’s an art as much as it is a science” in response to questions like “How much collinearity is too much?”. I think data analysis really is an art, and decisions about which variables to include or which transformations to make (etc., etc., etc.) don’t happen in a vacuum: You might make different decisions based on the research question you’re trying to answer, how much data you have, how much time you have, how complex a model you’re willing to use. And so on. All of which can be kind of overwhelming to students learning regression analysis for the first time.

So I think what I need to do is figure out how better to give general guidance without resorting to dogmatic rules (“If any predictors have correlation greater than .8 you must exclude some of them or use ridge regression!”), and I think a good starting point is probably to find better examples. I’ve had a hard time finding real data that illustrates important points while being neither contrived nor hopelessly complex, but they must be out there. I’d really appreciate any thoughts/advice.

9 thoughts on “Thoughts on Teaching Regression

  1. The concept of art is something most people can wrap their mind around. Telling people that regresssion analysis contains a bit of it, is definitley a welcomed epiphany.

  2. I think data analysis can be like any exploration – there's fun, surprises, disappointments, puzzles, and difficult decisions to make. Students who experience this and struggle through it will probably learn more.

    Thanks

  3. I presume you're familiar with the Statlib DASL datasets?

    Teaching data analysis is quite a bit different than teaching regression, as you've found. I do try to give rules of thumb, but mostly I try to impart principles. Do you cook? Some people cook from the recipes in a cookbook. They follow the recipes exactly and if you do that nothing is way out of whack and you avoid ending up with indigestible garbage. But if you taste the food while you're cooking sometimes you'll find that the apples are a bit tart and you'd be better off adding a bit more sugar, or the salad dressing needs more fish sauce, or that cumin would really help the pumpkin soup. Then the recipe is a framework and a guideline within which you balance and adjust things to get the best result. If you know what you're doing you get better results if you taste and adjust and improvise. If you don't know what you're doing and you just improvise, the dinner can end up a disaster and you'd have been better off following the recipe. Teaching data analysis is like teaching students to taste the food, but you want them to taste within the framework of the recipe; that's the maximin strategy. Asking how much collinearity you can tolerate is like asking how sweet the apple pie should be. Teach them to work within a recipe, but to taste often. As they gain experience, they'll learn how much off-the-cuff improvisation the recipe can tolerate.

  4. As a student learning regression, I would actually encourage students to create datasets. In order to understand heteroskedasticity, and some time series models, I used excel and Stata to create datasets with the properties/problems being studied.

    I actually think the main benefit from this is its independence of applied context. When confronted with a dataset with real variables, the perceptions the student approaches the issues with can overwhelm the statistical idea.

    Obviously, applicaions are necessary, and this technique works for those who are computer-savy enough, but it was very useful to me.

  5. Take a look at Ramsey and Schafer's book, The Statistical Sleuth. I wouldn't recommend it for a regression course (too broad), but it does have some interesting data sets. You can get them online at http://www.duxbury.com/statistics_d/ (search for "Sleuth"). Another trick is to send your students to the library to hunt up good data sets in the journals–why should you do all the scholarly work?

  6. I agree with dpegan. I took a time series course which relied heavily on simulating data (in matlab) so you could know the true structural model and then compare it to what Stata spits out when you do different things. It made a lot of things a lot clearer.

Comments are closed.