My new Chance ethics column (cowritten with Eric Loken).
Click through and take a look. It’s a short article and I really like it.
And here’s more Chance.
I’m pretty sure I’m going to have the phrase “cheeseburger-snarfing diet gurus” rolling around in my head all day. Excellent work.
Great article! It takes courage to admit that things can be done better.
That one is excellent! I constantly think about how we don’t use our knowledge (really) in teaching. And, I’m in psychology. I try! (Quantifying exams, not learning the names of students so I won’t be biased).
I was told and read on multiple occasions that learning students’names and using them in the classroom is beneficial to students’learning, and I believe that there are some well-done studies backing this up. Not to mention that in my experience both as a student and as a teacher classes are much more enjoyable when the teacher knows who he/she is talking to. While too much personal contact can make bias evaluations, it certainly improves the teaching aspect of being a teacher. So, not learning the names of the students seems way too drastic to me, and not a good idea overall. Should doctors restrain from getting to know their patients so it doesn’t affect their diagnostic? I’m a little shocked, and curious about how you came to this decision.
Perhaps, other approaches would be better suited to reducing biases when grading. Maybe asking students to write their name only on the last page of their paper or to use their student id would help?
A more extreme solution may be to separate the teaching and the evaluating part of the teaching process and have someone else than the teacher evaluate the students.
I’m bad at names, great at faces. (there is studies on why names are hard to learn). I usually know my students well, but have found that my badness with names kinda pays off. We are workibg on getting a system of anonymizing the tests, but it takes a long time.
This is why most (if not all) UK universities use anonymised exams and coursework submissions – students do not use their name, but their matric or a special exam number. One of my former lecturers once mentioned that with the introduction of anonymous marking, female students were suddenly doing just as well as males, whereas previously males received better grades. That seems to me like a good compromise, although it requires quite a bit of administration.
I’m a more pessimistic. Is it really true that there hasn’t been an extensive amount of randomized experimentation in the teaching field? What do Education Ph.D.’s do then? Maybe the returns from this kind of activity just aren’t worth the effort.
One might even wonder the same thing about Science. If we were to put all of our major scientific advancements in a space capsule and shoot it off into space for some alien civilization to find, how many of those advancements would have been discovered by randomized experiments? The Double Helix? Schrödinger’s Equation? Inoculations? The Law of Supply and Demand?
How many that were discovered by randomized experiments were effects so large they would have been just as easy to spot without it?
How much has all the false results from randomized experiments (~50%) slowed the progress of science down?
In which period was randomized experimentation used most heavily: 1900-1950 or 1950-2000? Which of those periods saw the most rapid scientific and technological advancements?
I wish your point of view in the paper was right, but it just isn’t that obvious to me.
You write, “In which period was randomized experimentation used most heavily: 1900-1950 or 1950-2000? Which of those periods saw the most rapid scientific and technological advancements?”
The key, I believe, is not randomization so much as experimentation and measurement. It’s harder to learn when pretests, treatments, and outcomes are unmeasured.
Ok, which period saw the most measurements and experimentation? By any proxy that I can think of, there were far more measurements and experiments in the later period than the former. Yet there is a growing consensus that the first period saw the most rapid advance.
Every scientist probably has a different theory of what it takes to make science advance. But the one thing all these theories have in common is that they are “theories” of advancement. So it is useful compare our pet theories of advancement to reality sometimes. And when I do, the results are nowhere near that clear cut.
For example, when Euler published the equations of motion for a rigid body, they were not the results of experimentation or experimentally tested. At least, not in the way that you suggest. Yet hundreds of years later, when man put satellites and people into space for the first time, those equations were used without even thinking about it.
Compare the process Euler actually used and the robustness of the results he achieved to say, any statistics laden experimental/applied Economics paper you’ve ever seen. Which comes off the better?
For that matter, how many of these statistics laden econ papers do you think anyone will be reading and using 200 years from now?
I love Statistics and want it to be supremely useful. But love doesn’t make it so.
Increased use of measurement, experimentation, and statistics is endogenous: when you run out low-hanging fruit or hit a dead-end, time for experimentation etc.
Could be. Then again, it could be that the kind of approach Gelman was suggesting is actually a very week tool. It can only succeed on some very low hanging fruit and everything after that is just wasted effort.
I don’t know what all those Ph.D.s in Education are doing, but they give the impression of doing the kind of research Gelman was taking about. Is there anyone out there who knows how much of an impact all that research has had on real world learning? Has there been a big jump over the last 100 years in the literacy rate of Americans with the same total years of education for example?
A more likely scenario is that there will be no real improvement until there’s a breakthrough somewhere else. For example, the invention of a “magical brain imager” which can tell when someone has really learned the Central Limit Theorem. Note that it’s unlikely such an breakthrough will itself be the result of anything like “Randomized Experiments”.
Still, it would be nice to see a randomized, controlled experiment to test the teaching of Bayesian Credibility intervals against Confidence Intervals to undergrads with no previous exposure to statistics. Maybe there’s some low hanging fruit if no one’s done it.
“…and a flawed measurement is better than taking no measurement at all”
I guess it depends how flawed the measurement is. If the flawed measurement means you change from a standard treatment to a new treatment when the standard treatment was better than it would have been better not to measure.
Problem-based Learning @ Republic Polytechnic in Singapore, a very new school, highly computerized
CEWD @ RP, i.e. they study what’s going on.
I was on an external review team for this a few years ago.
They use an extreme version of Problem-Based Learning, 3-year program for 17-19-year-olds.
They expect 2nd/3rd-quartile students, expect most to go into jobs and be useful, some will go on to university.
A student takes 5 course/semester, where course A is all day Monday, B all day Tuesday, etc.
Class = 25 students, 5 teams of 5 students each.
They get problem sets and discussion at start of day, then go off and solve the problems, write PPT slides to show what they did, and in mid afternoon, each team presents, with the other teams as harshest critics.
Day ends either with an individual quiz, or a review by the instructor of areas that caused trouble.
The first week of first semester, classes include use of Web search engines and how to assess credibility of what they find.
I sat in on two classes:
17-year-olds learning about Poisson arrival processes in their probability&statistics class.
19-year-olds doing systems design for a Web application
Most curriculum design is done by permanent staff, i.e., including subject experts and it is all heavily computerized/shared.
Instructors include many part-timers, who need not be domain experts and don’t create the materials, although they sometimes end up getting involved later. They are picked more for skills in communication and coaching, with enough expertise to teach using the standard materials. The courses are constantly being diddled in response to feedback from the instructors and administrators teach a course occasionally to keep their hands in.
When I was teaching computing long ago, we usually thought that if we created good programming problems, students would learn a lot, regardless of our lectures and we did some team projects as well.
The RP faculty said that was of course crucial in a PBL setup.
There were no disengaged students I could see and it seemed to work well for the limited sample I had. Of course, this was Singapore, not a random sample.
Just a couple of points: most kids who do well in school in Singapore don’t go to the polys, they go to junior college. It’s quite tough for poly grads to get into the local universities and they tend to struggle when they get there because they’re competing with the JC kids and scholarship holders from PRC. The RP course you reviewed should be seen in that context.
‘They expect 2nd/3rd-quartile students, expect most to go into jobs and be useful, some will go on to university.’
The first day they told us their goal was not to attract the top students (who of course all wanted to go to university directly), but to help average ones perform the best they could … so they had 3rd-quartile 17-year-olds doing Poisson processes and statistical techniques useful in manufacturing, for example. They had 19-year-olds that I might have imagined hiring as junior Web programmers for teams that needed such. Students were used to working in teams, which of course were different by class, and sometimes changed during a semester.
We probed them on what happened to the students who later wished to go on to NUS, etc. At that point, RP was so new that there was relatively little history, so they said all they could do was give us a few anecdotal examples. They thought students were doing OK, but they didn’t make strong claims. In any case, sending a lot of their graduates to university was a non-goal, but they wanted to make sure that students for whom that was appropriate would at least have a chance.
One reason for mentioning this was a general concern: I’ve often thought that many high school students would be better served by spending more time on probability and statistics than some other math areas, whether or not they were going on to university education.
At my institution, each application for admission is read & scored on a 4-pt scale by 4 or 5 faculty members. Believe it or not, the official rationale is that this process makes the process responsive to the variety of understandings that exist among the faculty on what makes someone qualified for admission. I once (probably more than once) tried to explain why using a spinning pointer that could come to rest at any of 4 sectors would be just as responsive & save us a lot of time, if *in fact* we don’t have shared criteria for admission (I’m pretty sure we don’t). Maybe I just didn’t explain the concept of reliability well enough …
That sounds a lot like the work of Richard Hake, Eric Mazur, and Arnold Arons. See, for example, http://www.physics.indiana.edu/~hake/. Mazur and his colleagues have recently formed the Peer Instruction Network for just this work.
It’s interesting that in an article in the ethics column you don’t really talk about the ethics of experimentation on students. Medical interventions are assessed on consenting patients, who (are supposed to) understand and accept that they might be randomised to the null treatment—placebo or standard of care. Should students not also give informed consent if you wish to include them in a systematic evaluation of your pedagogy? If so, what would you do with non-consentors? You presumably still have to teach them. If consent is not sought, the only way the experimentation could be fair would be if no one genuinely knew which method were best, and even then I can easily imagine the kind of complaints you might get from ‘consumers’ allocated to what in retrospect was the poorer treatment.
I’m also concerned about the practicalities of randomising students to different pedagogies. If you have multiple groups of students reading the same topic, and you can control for teacher-related confounders (e.g. you teach all the groups), timetabling issues might prohibit random allocation of individual students to classes. If you instead let students be allocated to groups and then randomise groups, would you not need to repeat the experiment on lots of groups before you could be sure the confounding was overcome by the randomisation?
I think there are some good reasons why we don’t practice in education what we preach for other fields.
We’re already experimenting on students, in the sense of trying different things in different classes in different semesters. We’re just not recording what we’re doing, nor are we generally getting good pre- and post- measurements. I don’t see why gathering and analyzing such data (obviously keeping students’ names confidential in any reports) is worse than the current practice of trying out ideas haphazardly and grading students based on casually-conducted exams. On the contrary, I think it would be more ethical to consider our interventions with more care and to work harder to measure their effects.
Putting myself into the perspective of a student, I’d feel quite okay if my prof said s/he was “trying out a new way to teach ANOVA, say, because last year it hadn’t worked out so well,” but I’d be at least a bit indignant to find out that a separate group of students reading for the same exam as me were being taught with the new approach while I was in the control group (and if you’re not tenured, the last thing you want is indignant students). I think that’s my issue with experimentation in the educational context. I doubt anyone would complain with putting more thought into measuring or recording our approaches, and analysing them as observational studies, but to experiment on students just, somehow, feels wrong.
There’s an enormous contradiction in how young adults on campus are viewed when considered as “research subjects” or “students”. Want to ask a bunch of intro stats students how they FEEL about statistics? First fill out pages and pages of human subjects applications justifying your measures, methods and credentials. Want to ask them what they KNOW about statistics (i.e. give an exam or test)? Go ahead and make up the questions, even if you’ve never done it before.
By the way, the kids (“research subjects”) answering the questions about how they feel about statistics are doing it to earn a coupon for a slice of free pizza and they’re hardly concerned about the consent forms and debriefing and paperwork you’ve showed them. The kids (“students”) answering test questions are paying the school tens of thousands of dollars, and they are very very concerned about the rewards they’ll earn.
You’re right to connect this to the issue of research ethics. But Andrew’s right in that the whole college teaching thing is a big unmonitored experiment already. We know very little about the best principles for delivering our product (even though the basic model for instruction and assessment is many decades old). And in many places we’re making bold ventures with almost no empirical basis. What about the stampede to online and blended education? What about the new mantra “hey, let’s ‘flip’ the classroom.”?
Interesting comment Andrew, and I agree with you. But, let’s extend your comments to another context. I am a medical doctor, and I firmly believe that there is a serious disconnect between the research and non-research “worlds”. An interesting thought experiment is to replace “students” with “patients”, and “classes/semesters” with “hospitals/countries” in your comment above. For example, I can choose to anesthetize patient A with a combination of drugs that I believe (based on zero evidence) is superior to my colleague’s technique (who anesthetizes patient B). I can actually do this for a whole career under the “usual clinical care” moniker, and that apparently is ethical in the medical world, even though I would have exposed thousands of patients to a potentially inferior technique!
However, if I used my technique, and wanted to capture data (to be responsible and actually prove that my technique is better), and if I don’t obtain consent, then this is somehow unethical. This attitude is absolute rubbish and treats all of our patients with profound disrespect. In fact, in clinical medicine today should be ashamed that many clinicians don’t ever consider doing research because of the barriers erected in front of them. We can perform “quality assurance” research internally but this cannot be published – thereby forcibly limiting the population who could potentially benefit from such research.
There is absolutely no question that research must be ethical, and the oversight needs to be there. But, there is also no question that a) not all research that does not involve ethics approval is necessarily unethical, and b) ethics committee approval does not in any way mean that research is ethical (one need only look at multi-centre trials, where one centre finds the protocol to be ethical whilst another- less than 100 km away – does not, to prove this point).
Gelman & Loken http://bit.ly/H3BaI9 point out that statisticians rarely: (a) use standardized tests of high reliability and validity, (b) use pre/post testing to assess what has been learned in courses, (c) perform systematic comparisons of treatments, (d) utilize randomized control trials.
They may be surprised to learn that all the above except for “d” have been utilized in physics education for about a quarter of a century – see e.g.: “The initial knowledge state of college physics students” [Halloun & Hestenes (1985) http://bit.ly/b1488v ]; “Interactive-engagement vs traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses” [Hake (1998) http://bit.ly/9484DG ]; “Peer Instruction: Ten years of experience and results”[Crouch & Mazur (2001) http://bit.ly/ppm3Bm ].
Why the absence of randomized control trials? See e.g. “Re: Should Randomized Control Trials Be the Gold Standard of Educational Research?”[Hake (2005) http://bit.ly/GRLXEX ]
Conflating issues of instructional efficiency with social ethics … just adds confusion, rather than enlightenment.
Normal instructional methods at all levels of formal American education are highly inefficient… and have not changed much since medieval times. The teacher/master lecturing a neatly grouped package of subservient students is the standard model everywhere. The normal testing/grading system is horrific in design & practice.
These huge shortcomings have been recognized & well-documented for at least half a century. But they persist due to ritual, custom, and strong economic interference into the natural market relationship between the sellers of instructional-services… and the core consumers of those services.
Things don’t change because the sellers (professors, teachers, administrators, government education bureaucrats, etc) have a very strong artificial-structural advantage over buyers — and thus have little practical incentive to change things merely to improve outcomes for buyers.
Medical Doctors get their fee whether they cure their patients or kill them. Likewise, professors’ paychecks look the same … no matter what the outcome of individual students. Self-interest always rules, if not tempered by market forces. Status Quo is the easiest path for all established doctors and professors. Change is uncomfortable.
Rare, noble professors who ethically seek to seriously improve things for their students — will soon find themselves scorned & minimized by their colleagues and employers. Boat-Rockers are not appreciated by those ensconced in cushy educational jobs.
Interesting food for thought, but I’m very very sceptical. To begin with, I don’t buy that what I want to achieve as a teacher can be appropriately captured by a one-dimensional number (probably not even a multi-dimensional one). I’d be more happy with a student who gets a B in the exam and is still interested in statistics and in learning further about it three years later than with one who gets an A and then forgets all stuff after leaving university.
I don’t buy that examinations measure an existing but unobserved “true” quality of the student’s learning in some biased/unreliable or unbiased/reliable fashoin; the examination itself defined what it measures. And by the way, examinations are not only there for measuring student’s abilities but also (and in my understanding more importantly) for making them work beforehand. What is good for one aim isn’t always good for the other.
And I think that some of the most important things that happen in classroom happen between the individual lecture and bright or not so bright students making comments because good teaching implies involving the group. This cannot and should not be standardised.
The more you standardise examinations, the more you stop teachers from being spontaneous and flexible and the more you will put off students from thinking creatively outside the box.
I see your point. There can be too much measurement. But I think that, at least at the universities I’ve studied at and taught at, we’re too far over at the other extreme. Setting aside all the intangibles, we don’t even have a sense if the students are learning the basics. But, yes, standardized tests should only be part of the story; there should be room for creativity.
That said, why can’t we as statisticians give the same message regarding other fields. When we talk about statistical methods for medicine or policy analysis or even for education, we talk all about bias and variance and probability, we criticize studies for not being randomized, etc. If it’s really true that these ideas aren’t so important, I think we should be changing how we teach the stuff.