## Students don’t know what’s best for their own learning

[This post is by Aki]

This is my first blog posting.

Arthur Poropat at Griffith University has a great posting Students don’t know what’s best for their own learning about two recent studies which came to the same conclusion: university students evaluate their teachers more positively when they learn less.

My favorite part is

That is why many students assume that reading or highlighting passages in their text-book, or merely listening to a lecture, is enough to produce learning. They mistake the ease of the task with greater knowledge. Time-consuming and effortful tasks, like self-testing their knowledge, are consequently seen by students as less efficient for their learning, despite the fact that the more difficult tasks produce the most learning.

Years ago I used to give usual lectures (me talking 2x45mins) and I got great evaluations. Then I was five years on a research grant without lecturing. After I started teaching again I decided to go with a more interactive approach which  requires more work of the students and my evaluations score was lower than before. Now I’m not worried anymore. Now I just need to be able to convince the students that they want to learn and not pass the course as easily as possible.

1. Rahul says:

I find the Arthur Poropat post rather distressing. It’s essentially anti-accountability. I dislike that he concludes “Given the evidence, student evaluations are a distraction from the responsibility to provide the best possible education for the nation.”

Learning is important to students but so are actual grades. Employers look at GPAs. It isn’t entirely irrational that students award high evaluations to teachers who gave better grades.

What good is fantastic learning if you end up with a 2.5 GPA & the hiring managers set a 3.5 minimum for screening interviews?

Maybe if universities get rid of grades entirely, then you’ll see students reward pure learning. Are we willing to do that?

• James Cunningham says:

> What good is fantastic learning if you end up with a 2.5 GPA & the hiring managers set a 3.5 minimum for screening interviews?

Does this actually happen? I was looking for employment for a few months about a year ago, and I only came across two organizations that asked for my GPA — and both followed a highly test-philic, Googly hiring process. (One wanted not only my GRE and my GPA, but also my college transcripts and my ACT results. That was a pretty crazy outlier.) I empathize with those who put in extra effort without seeing results, but if the emphasis on fairly effortless As and Bs changes then so will the attitude of employers (if indeed that attitude is prevalent).

All of that said, I don’t agree with the proposition that evaluations are objectively useless or worse than useless. Some teachers are mediocre or just plain bad for reasons not directly related to the difficulty of the material — putting in minimal effort, missing class sessions and office hours, failing to grade in a timely manner (or grade at all), allowing class discussions to be derailed, being *too* easy, getting facts wrong … it happens!

• Rahul says:

80% of the online applications I’ve filled in asked for a GPA & quite a few companies listed a minimum too.

• James Cunningham says:

It’s possible I was asked my GPA more than I recall — my GPAs were sufficiently high that I wouldn’t think twice about reporting them — but it wasn’t common, and I don’t remember ever seeing a minimum GPA. For the most part I applied to fairly entry level software development and business analyst jobs.

2. Aki, among your students, do those students who do well in subsequent classes give you higher ratings in teaching evaluations? The lower ratings might be coming from those who didn’t learn much and/or didn’t enjoy being forced to work on problems. I suppose it’s hard to know, since these evaluations are anonymous. But my experience is that the top-performing students in my courses tend to write emails to me (before they learn their grades) saying that they got a lot out of my courses. It’s often the poor performers who complain. My impression is that the latter group looks everywhere but within themselves to find a reason for their poor performance. On the other hand, it’s easy to become complacent as a teacher and just blame the students for everything. There is always room to improve one’s teaching; I can’t say I’ve converged to the (or an) optimal state. It would be cool if one could hire a coach to watch one’s teaching and give one dispassionate advice on what one’s doing wrong that leads to poor evaluations from a segment of the class. In theory, it must be possible to get even the so-called poor performers to do the best they can.

3. There is often resistance on the part of students to “active learning,” as opposed to being passive audiences at a traditional lecture, even though the former helps most of them learn more. I’ve found that *explaining* to them the reasoning behind active learning, as well as presenting evidence supporting its use, helps a lot in getting students on-board. This isn’t surprising: given their prior experiences, why *should* we expect them “know what’s best for their own learning?”

This is separate from the issue of how course-evaluations are structured, but I’ll skip that for now!

• Rahul says:

What’s the difference between active & passive?

• Elin says:

Passive learning is like watching TV, Prof talks, students listen and presumably absorb something via hearing (tell me) or hearing and seeing some demonstrations (so sometimes it involves both tell me and show me).

Active learning is anything that involves the students actively doing something–discussing, working a problem, writing-to-learn, doing a lab, analyzing data, what it is depends a lot on the subject. For example in my very basic research class, where the students are not math people at all, last week I was talking about why random assignment is considered so great and everyone took the same set of people and randomly split them up and we put the data for the two groups on a few variables in a Google sheet and then we could actually see that it really works really well and I learned that the way I tried the explain that I was betting that the groups would almost always be almost but not exactly the same was really confusing to some smart ELL students although it was fantastic that they were noticing that the groups were not identical.

4. Keith O'Rourke says:

Making sense of student evaluations is even harder than making sense of most epidemiology studies!

A colleague told me that in their department for important issues, they invite a random subset of the class for confidential interviews with senior faculty and suggests that they are actually very good at assessing the value of instructors, being quite able to set aside the easy or hard marker aspect. One thing that is confounded here, is that the assessment is after the final exam was written and anxiety about that has past.

(If students assessments were done after the final exam, an instructor could encourage learning with stern marking before the final and then not be “penalised” versus easy markers for doing this.)

5. Emma Tosch says:

“The first study involved students at the United States Air Force Academy, among the top few American engineering colleges, while the second was conducted at Bocconi University in Italy, one of Europe’s top ten business schools. Both are highly esteemed institutions.”

From the first (couldn’t access the second):

“Approximately 40 percent of classroom instructors at USAFA have terminal degrees, as one might find at a university where introductory course work is often taught by graduate student teaching assistants. However, class sizes are very small (average of 20), and student interaction with faculty members is encouraged. In this respect, students’ learning experiences at USAFA more closely resemble those of students who attend small liberal arts colleges.”

“Students at USAFA are required to take a core set of approximately 30 courses in mathematics, basic sciences, social sciences, humanities, and engineering.”

“Professor data include academic rank, gender, education level (master of arts or doctorate), years of teaching experience at USAFA, and scores on subjective student evaluations.”

USAFA is hardly representative of your average academic institution, let alone a[n elite?] liberal arts colleges. Furthermore, the stakes for the students are much higher, and one could argue that it would be reasonable to suppose that the students who choose to go to a place like USAFA are more risk-averse (no debt, high probability of medium-high earning employment after graduation) than your average college student. With no choice given in the curriculum and all curricula being equal, it seems the decision making calculus isn’t really generalizable.

Anecdotally, in a school where I was more heavily involved in teaching, a person who had some of the highest teaching evaluations was one of the toughest teachers. She required frequent quizzes and homeworks, and required class participation. She gave extra time to students who were really struggling, provided that those students worked. Other professors who also ran tough classes did not get as high teaching evaluations. Why? Some came across as jerks. Others were clearly pandering. None gave the time that she gave. These are variables that aren’t represented in the USAFA study (as far as I can tell by skimming).

While it may be possible that some students really do want easy-graders, the conclusion of the blog post in the link (“students do not understand what helps them to learn…Given the evidence, student evaluations are a distraction from the responsibility to provide the best possible education for the nation.”) is dangerous and does a disservice to students *and* teachers. *People* are bad at assessing all sorts of things, and any person, whether student or teacher, who has sufficient introspection and is willing to question themselves will understand how to better themselves. Maybe it’s the tools, rather than the students themselves, that are at fault.

6. In my opinion, the only valid assessment of the quality of a teacher’s course design and teaching method is a before-after standardized test designed by a third party. Student assessments shouldn’t be considered in determining whether the professor did a good job of teaching the necessary material, but can be more valid as an assessment of whether the professor did a good job of *mentoring* (ie. helping students make decisions about academic career, choose projects that are appropriate, encourage extra investigation of additional materials etc).

Obviously a before-after 3rd party test requires that the course objectives be pretty darn clear, which isn’t always the case, especially at upper-division / seminar type courses, where I think the function of the course is often not really “teaching material” but “mentoring and providing opportunities to explore a topic”. Course evals would obviously be more appropriate in such an open-ended course.

One thing I certainly found was that in middle level highly standardized courses in engineering, the vast majority of students didn’t want to be given opportunities to learn the material, but simply wanted to know how to do the textbook type problems that would appear on the test (I was a TA and had no control over tests). I did things like create example design problems to illustrate how concepts might be used in future courses… and found that less than 10% of the class even clicked the link to get it, and essentially no-one did the projects.

So, my general inclination is to believe the conclusion even though as Emma Tosch describes above, the conclusions are not really justified entirely by the studies.

• Andrew says:

+1

Even though I don’t do such evaluations in my own classes. But I agree they should be done, and at least I feel bad about it. So there’s that.

7. Christian Hennig says:

Perhaps we shouldn’t be so statistical about it?
I think that student evaluation can be extremely helpful if you ask the students to tell you in free text form what they like and what they don’t like. Some of what they say makes a lot of sense and the lecturer can learn much from it. The lecturer can decide what to ignore, with good reasons. The outcome can also be used to demonstrate the students how heterogeneous their wishes can be.

This doesn’t measure though how well the lecturer performs on a one-dimensional scale. Too bad.

8. Jim says:

Why not just run student feedback a year or so after the class, once the lecturer’s charisma (or lack of) wears off, and the student has a better judgement of how well they learned?

• Martha says:

I agree with Jim that “delayed” feedback could be informative. For example, I’ve had students who tell me a semester or several years after they’ve had a class from me that, although they didn’t appreciate what I did at the time, they later saw how it helped them in a later class, or when they were teaching, etc. (And, of course, some students who rated the course highly at the time might have a lesser opinion a couple of years later.)

I recall that Rice University used to (maybe still does?) have a teaching award based on later feedback — I think post-graduation.

But, in the opposite direction, getting feedback *during* the course (rather than at the end) can be useful for detecting when you’re doing something wrong that can be improved for the rest of the semester. (I think the ed folks call this “formative assessment”)

On the Chance article Andrew linked to:

1. I don’t think medical treatments are based on as rigorous testing as the article seems to imply — in addition to all the p-hacking and other questionable stuff that goes on, medical practitioners to a great extent rely on their own judgment/preferences, and on what is fed to them by drug company representatives.

2. There is little if any funding or other enablement for experiments in teaching. In fact, randomization and other desiderata for good design are rarely if ever feasible.

• Andrew says:

Martha:

I’ve found that feedback during the semester can help, although not always as much as I’d hope, as sometimes it can be easier to diagnose the problem than to find a solution!

9. polymath says:

I can easily believe that “all else being equal” the _average student_ would favor (and reward, through scores) classes that teach less.

But all else ought not be equal. Interactive learning is much more FUN than sitting in most lectures. Interaction done right increases the social payoff, for the student, of knowing her stuff. And interaction done right creates a compelling story and gives the leader a way to make the development of the material more relevant.

“Interaction done right,” in my view, requires seminar-size classes. In lecture-size classes, interaction often makes for a worse learning environment. The questions are forced, the answerers are called on to transparently regurgitate material they’re supposed to have read, and the flow of the class is constantly interrupted.

If you’re really trying to get at the effectiveness of a teacher, I wouldn’t look at average evaluation scores, I’d look at the 9th decile score. I hypothesize that a bimodal distribution, as long as there aren’t troubling demographic correlates with the modalities, is an indicator of a good teacher who makes people work hard.

• Elin says:

Actually as mentioned above, students often resist active learning, because it is much harder work than sitting and passively listening. Not to mention that you have to actually attend class if there is work that doesn’t involve hearing the textbook restated going on. It was kind of amazing to see the Harvard data on lecture attendance for any class not for premeds. I would be so embarrassed.

In large classes it also needs to be “done right.” There are huge variations in this and in the kinds of approaches you can take. There is no reason for this to be true “he questions are forced, the answerers are called on to transparently regurgitate material they’re supposed to have read, and the flow of the class is constantly interrupted.” I mean in a large class you actually have a decent sample size for collecting data or use as a sampling frame, just to name two things helpful for teaching statistics or research methods.

I mean, I love a good lecture, but I have a Ph.D., that means I am in a group selected for being successful learning stuff (probably got good grades in all their lecture courses in their disciplines as undergraduates) and interested in and knowledgeable about the subject matter they go to lectures about. And I would say the lectures I love are not text books regurgitated. It’s hard that as a teacher you have to think about what was going on with your classmates who didn’t go to doctoral programs.

• Andrew says:

Elin:

That’s one reason I have students work in pairs during class. If you’re supposed to work on your own, it’s easy to get distracted or to stare into space, but in pairs you can stay focused.

10. Steve Sailer says:

When I read RateMyProfessors.com, I try to pay attention to the quality of the prose style of the comments and give more weight to people making intelligent, well-written comments. There often seems to be disagreement between the ratings given by smart commenters and the ratings given by the “Sux, LOL” commenters.

• Andrew says:

Steve: In my experience, a class can be effective with students at the high range and not with students at the low range (or vice-versa, or both, or neither). I take your point that more thoughtful students give more information in the ratings, but the interpretation of such data can be tricky because it’s possible that the disagreement is arising because the course really is working better for some students than others.

Typical comments on teaching evaluation forms:

Student A: The course is fine but it went too slow, I wanted to see more.
Student B: The course is fine but it went too fast, the prof tried to do too much.

Unless you’re an amazing teacher, it’s hard to satisfy both students A and B.

11. Kai says:

I think a lot depends on what you want with an evaluation.
Do you want to improve your teaching? Do formative evaluation, the likes of which has been described above: Gather a few people and do a group discussion about the class. Give the whole class a sheet of paper with just four fields: 1) Please give some general comments on the class. 2) I learnt this: 3) You should keep this: 4) You should change this: (or something along those lines, depending on what exactly you want to know.)
Or do you want something standardized to compare people, departments or just have something you could put into a job application to show others, you are a good teacher. I think to achieve valid results here is much harder. There are tough and good teachers, who get bad grades, may be, because they are not only tough but also not funny. Then there are tough and good teachers, who get good grades, may be, because they _are_ funny. There are the ones who do get good grades because they really do good mentoring (as was mentioned above) or have some other way to demonstrate to students, they are really interested in what happens with their students and their learning. Then there are the ones who are neither tough nor good teachers (and may be do not really care what, if anything, students are learning), who trade in good evaluations for good grades and easy work. And then there are …
To cut this long story short: There are almost as many types of teachers as there are types of students. Can we possibly believe it is an easy task to validly evaluate a complex thing as teaching and learning with some standardized questions in a questionnaire?
(I love standardized questionnaires, and I even do them for evaluation. I just do not expect them to be able to do things, they are not good for.)

• Martha says:

Yes, diagnosing the problem is usually easier than finding a solution — but it’s a necessary step. Ya can’t fix it if ya don’t know it’s broken.

• Martha says:

Oops — this was supposed to be a reply to Andrew’s comment, “I’ve found that feedback during the semester can help, although not always as much as I’d hope, as sometimes it can be easier to diagnose the problem than to find a solution!”

• Martha says:

+1 to Kai

12. dab says:

I am neither a statistician nor a social scientist, so most likely I’m missing something big here. But by skimming the first paper providing evidence for the claims made in the blog post by Arthur Poropat (namely, http://www.jstor.org/stable/10.1086/653808), it looks like their story hinges on the covariance of two parameters in their linear mixed effects model being negative. One is meant to measure the value-added by the professor of the intro course in the intro course ($\lambda^1_{j^1}$) and the other is meant to measure the value-added by the professor of the intro course in the subsequent course for which the intro course is a prereq ($\lambda^2_{j^1}$). But according to table 4 of the paper, that covariance between those two parameters is estimated to be $-0.0004$ with a 95% CI of $(-0.0051,0.0043)$. So, it doesn’t look to me like there is much evidence for the covariance being negative…which suggests (assuming I’m right in thinking that the whole story hinges on that estimate) that the paper doesn’t show much at all. I feel like I must be missing something….

• I agree, they don’t do a great job of describing their results.

• The bigger question is whether value added in the follow on course correlates positively or negatively with student assessment of the teacher in the introductory course. That doesn’t seem to be actually a part of their model. But they do describe that as the case, so it seems to be something evaluated separately.