Class-participation activities in a regression class

I’m trying to integrate class-participation activities into the Applied Regression and Multilevel Modeling course I’m teaching this semester. We have a whole bunch of these activities for introductory statistics (in my intro class I have at least one demo and one other activity per lecture) but I’ve never before tried to consistently use them for a more advanced class.

I’ll tell you how things have been going so far and then update occasionally.

The first 2 weeks: what they need to learn

The first several weeks of the class are a review of classical (non-multilevel) regression, with a focus on understanding the model, particularly the deterministic part (that is, y=a+bx, with less of a focus on the distirbution of epsilon). This is also a time for the students to get familiar with R, which they’ll have to use more of when working with more complicated models–especially when trying to use inferences beyond simply looking at parameter estimates and standard errors. The first two homework assignments involve fitting simple regressions in R, graphing the data and the fitted regression lines, and building a multiple regression to fit Hamermesh’s beauty and teaching evaluations data. Jouni, as T.A., has to spend a lot of time helping students get started with R. The main mathematical difficulties are learning and understanding linear and logarithmic transformations.

The first 2 weeks: in the classroom

Lecture 1 starts with some motivating examples, including roaches, rodents, and red/blue states. I stop and give the students a few minutes to work in pairs to come up with explanations for the patterns of income and voting within and between states. I describe the roach study and the rodent study and then give the students a minute to discuss in pairs to see if they can figure out the key difference between the two studies. (The difference is that the roach study has the goal of estimating a treatment effect–integrated pest management compared to usual practice–and the rodent study is descriptive–to understand the differences between rodent levels in apartments occupied by whites, blacks, hispanics, and others. We’ll get back to causal inference in a few weeks.) I yammer on a bit about the skills they’ll learn by the time the course is over, and how I expect them to teach themselves these skills. Analogies between statistics and child care, sports, and policy analysis. Cautionary examples of Dan Marino and Cal Ripken. The beauty and teaching evaluations example. I give the equation of the regression line, the students have to work in pairs to draw it. Use the computer to fit some regressions in R and plot the data and fitted regression lines. (No residual plot for now, no q-q plot: we’re focusing on the important things first.)

Lecture 2 starts with the cancer-rate example. I hand out Figure 2.7 from BDA and give the students a few minutes to work in pairs to come up with explanations for why the 10% of counties with highest kidney-cancer deaths are mostly in the middle of the country. I write various explanations on the blackboard and then hand out Figure 2.8. We discuss: this is a motivator for multilevel models. I was going to bring up the example of the test with 1 or 100 questions but forgot to mention it–maybe I’ll do it in class in a few weeks. I then give them the regression of earnings (in 1993) on height (in inches): y = -61000 + 1300*height + error, with residual sd of 19000. In pairs, they must draw the line and hypothetical data that would lead to this estimated regression. This is a toughie–the students have to realize that heights are mostly between 60 and 75 inches, and that the data must be skewed to all fit above the y=0 line. We talk transformations for a bit–some more activities in pairs (for example, what’s the equation of the regression line if we first normalize x by subtracting its mean and dividing by its sd). Discussion of appropriate scale of the measurements and how much to round off. Comparisons of men to women: adding sex into the regression model. In pairs: what’s the difference in earnings between the avg man and the avg woman (it’s not the coef for sex, since the two sexes differ in height). Why it’s better to create a variable called “male” than one called “sex.”

Lecture 3 starts with answering questions. What are outliers and should we care about them? (My answer: outliers are overrated as a topic of interest.) Why is it helpful to standardize input variables before including interactions? Long discussion using the earnings, height, and sex example. Standardize earnings by subtracting mean and dividing by 2*sd. Standardize sex by recoding as male=1/2, female=-1/2. Lots of working in pairs drawing regression lines and figuring out regression slopes. Understanding coefficients of main effects and interactions. Categorized predictors, for example modeling age as continuous, with quadratic term, using discrete categories. Start talking about the logarithm. The amoebas example–at time 1, there is 1 amoeba; at time 2, 2 amoebas, at time 3, 4 amoebas; etc. In pairs: give the equation of #amoebas as a function of time. Then give the linear relation on the log scale. (I should have had this example starting at time 0. Having to subtract time=1 is a distraction that the students didn’t need.) Graph of world population vs. time since year 1, graph on log scale. Interpreting exponential growth as a certain percentage per year, per 100 years (in pairs again).

Lecture 4: all about logarithms. On the blackboard I give the equation for a cube’s volume V as a function of its length L. Then also log V = 3 log L. Then, in pairs, they have to figure out the corresponding formulas for surface area S as a function of volume. It’s not so easy for students who haven’t used the log in awhile. Then we discuss the example of metabolic rate and body mass of animals. We then go to interpreting log regression models. Log earnings vs. height. Log earnings vs. log height. Interpreting log-regression coefficients as multiplicative factors (if the coef is 0.20, then a 1-unit difference in x corresponds to an approximate 20% difference in y). Interpreting log-log coefficients as elasticities (if the coef is 0.6, then a 1% increase in x corresponds to an approximate 0.6% increase in y). All these are special cases of transformations. Also discuss indicator variables, combinations of inputs, and model building. How to interpret statistical significance of regression coefficients. We did some more activities in pairs but I can’t quite remember what they were.

How do I have time to cover the material?

People have often told me that they’d like to do group activities but they can’t spare the class time. I disagree with that line of thinking. My impression is that students learn by practicing. A lecture can be good because it gives students a template for their own analyses, or because it motivates students to learn the material (for example, by demonstrating intersting applications or counterintuitive results), or by giving students tips on how to navigate the material (e.g., telling them what sections in the book are important and what they can skip, helping them prepare for homework and exams, etc.). The lecture room also can be a great way to answer questions, since when one student has a question, others often have similar questions, and the feedback is helpful as the class continues.

But I don’t see the gain in “covering” material. I don’t need to do everything in lecture. It’s in the book, and they’re only going to learn it if its in the homeworks and exams anyway. The class-participation activities allow the students to confront their problem-solving difficulties in an open setting, where I can give them immediate feedback and help them develop their skills. And having them work in pairs keeps all of them (well, most of them) focused during my 9-10:30am class.

Summary (so far)

This has been pretty exciting so far. We’ll see how it works for the whole semester. At this point, I don’t even think I’m capable of doing straight lectures, so it’s good that the activities are working. But maybe . . . maybe . . . this could transform the teaching of statistics! It’s a hope (or distant goal).