Who falls for the education reform hype?

Phillip Middleton writes:

My wife is a 5th grade teacher, in Texas, in a title I school (free lunch, other subsidies, poor and emotionally disturbed kids, CPS cases, you name it) on the west side of San Antonio. There are a number of things I’ve been exposed to as a result, the net of which tells me that we are ALL (including the effervescent Gates foundation) thinking about how we deploy and measure education incorrectly (though I at least think common core was a fundamental step in the right direction for curriculum).

On the deployment side, there is much ado about “pushing down” curriculum to younger ages. I’m not sure what to think about this as, developmentally speaking, I would think that “social learning” is probably much more acutely important than say, learning integral calculus (parents of the trophy kid phenoms would probably disagree w/ me here). Learning is a VERY messy process, and I wonder whether or not latent benefits of advancing curriculum at an earlier age is even observable in high school/college/vocational matriculation.

Follow from that to the measurement side of education, it’s fairly clear to me that the race to teacher standards and evaluation have somehow obscured understanding of the student. The ‘silver bullet’ measure seems to be standardized test scores, among other things. However given that standardized tests have little/no accountability (think Pearson here – owners of media, curricula, and exams ….and now GED), are we sure we trust that such standard exams pass the ‘do they measure what they intend’ sniff test? My gut tells me no.

So far as I can tell, measures on standard multiple choice examinations, particularly over broad topics, do not consider what the student brings to the table. For a single multiple choice question where there is a “best” answer, “best” is left to interpretation. A “thoughtful” answer may not be one that agrees with a test taking strategy, and may therefore be incorrect (though a response may be incorrect for other reasons), given a question is dichotomized to right/wrong. So by understanding the strategy to take the exam, one may be able to “game” the exam. And there appears to be some evidence that test “gaming” tends to trump more pure application of knowledge on test scores.

If that weren’t the case, Kaplan, Princeton Review, others, wouldn’t find rather large revenue from selling services and products which essentially teach one how to game the test, not necessarily demonstrate learned knowledge. Add into this a student’s SES, cultural upbringing, experiences, etc, and the measurement issue becomes further complicated. So again, do the tests measure what are intended? I’m not a psychometrician or sociologist… so I can only ask the Q.

Now take policy on evaluating teachers using these measures and all sorts of other questions come up. A tool called EVaaS is a SAS (aka $$A$$) product designed to eval teachers….and AFAIK only inputs std test scores. Without understanding local economic, social, and movement dynamics of students (movement – transience in/out of different schools), how are we even sure teachers are being measured justly and national institutes of education aren’t being sold very expensive bills of goods?

My reply:

Many prominent statisticians including Don Rubin, Howard Wainer, and Jennifer Hill have objected to the excesses of the “value added” movement in education for more than a decade now. I suspect that statisticians are less likely than the usual suspects to fall for the education reform hype. I agree that the whole thing is frustrating, in part because often it seems to be done in the name of statistics. I suppose the analogy would be if, in the name of nutrition, the U.S. government were encouraging us to eat expensive unhealthy foods that made some people a ton of money. Hey, wait a minute . . . !

My own understanding of what works in elementary education is not informed by any research. If it were up to me, I’d be spending more time teaching kids stuff like art, music, sports, and, especially, foreign languages, which they can learn so well as kids, and less time teaching things like science which is so crude at that age. Why not teach kids 5 languages? But evidently this view is not popular in the U.S., so I expect I’m missing something here.

82 thoughts on “Who falls for the education reform hype?

  1. I agree that it would make a lot of sense to make better use of children’s (and infant’s) language-acquiring capacity – and it’s a damn shame that we don’t. But I think that would be hard to do under the current school model. For acquisition to work, children need several hours of exposure every day; and it must be exposure to interesting, natural input – they gotta use language for stuff they want to do, not sit in a chair watching some adult speak for a prescribed set time.

      • I think that one of the most valuable things a young person can learn is a second language; and the best time to do this is in the preteen years (the younger the better).

        • Why is it so valuable (especially if their first language is English)? I’ve personally never felt like I’m missing out on much.

        • I think that early learning of a second language really helps a child to develop his or her brain, while the brain is most receptive to language learning. I had a childhood friend who left the US and went to France with her parents when she was six or seven. I met her again when she was a teenager; she was fluent in both languages. I always envied her for that opportunity.

          I was just discussing this this morning with a Vietnamese-American. He came to this country quite young and was not fluent in the language of his parents. He really regretted that…he found it a distinct disadvantage when he visited family in Vietnam as an adult, he told me this morning.

  2. “If that weren’t the case, Kaplan, Princeton Review, others, wouldn’t find rather large revenue from selling services and products which essentially teach one how to game the test, not necessarily demonstrate learned knowledge.”

    I don’t doubt that standardized tests are game-able, but test prep company revenues are pretty lousy evidence for that claim. There’s plenty of money to be made selling products and services that don’t work.

  3. I’m sure I’m not the only PhD student who was shocked by the lack of training in how to teach. Moreover, from what I know as a result of conversations with friends and others who have gone to school for teaching degrees, there isn’t much substance there either. As far as I can tell there is no consensus and no good research at all about good ways to teach anything to anyone, let alone how to reform the way we teach now. Even the most basic introductory classes at the college level have no established best practices. In some fields the basic content has been the same for years, if not decades. Why isn’t there any research showing the best way to teach it? Of course there might not be one best method in the end, but right now there doesn’t appear to even be a beginning. The closest I’ve seen is the idea of the flipped classroom, and that’s more of an untested gimmick that only a few fields can use than a real attempt to find good teaching practice.
    Yes, elementary education has no research support, but to me it seems that no education has research support. Should we be surprised that nobody can agree on what outcomes to measure or how to measure them? The only commonality in how we view education is dissatisfaction. I suspect that the quality of education research would be at least as bad as nutrition or social psychology if there were as many people working on it as in other fields.
    Don’t forget that many of the assumptions and concepts in education come from the same psychology of personality and culture that gets so much criticism on this blog. Ask an education or psych undergrad about what they learned about children: mostly, it’s about memorizing lists of arbitrary “stages of development” that various psychologists have proposed over the past century or so. It’s no surprise that there’s a lack of educational research, then, because we don’t even have anything close to a testable model for how children learn or develop. It’s a sad state of affairs.

    • I think much of the problem is that while we do know quite a bit about how children learn, and adults too for that matter, little or none of this knowledge actually is taught to prospective teachers.

      I was reading a blog article (currently untraceable due to a failure to back up my recent Bookmarks list) that discussed some basic modern learning theory and pointed out that essentially none of it appeared in education textbooks.

      I suspect that much of the problem is that a lot of what we know about teaching/learning is coming out of the cognitive psych/cognitive science labs. Those researchers are not all that interested in teaching kids and the educationalists do not even know about much of the literature or if they do they may not have the background (and funding) to convert fairly rarefied technical papers into classroom instruction modules.

      And from my somewhat causal reading, the entire US educational system (from K to 12 anyway) seems to have gone raving mad so trying to implement new and improved teaching methods there is not likely to be a priority.

      • I guess I fell victim to the same trap, then, because I didn’t even know this literature existed. I assumed that cognitive psych would be the field to bring this stuff to light, but they don’t seem to be connecting with educators. It also isn’t clear to me that teachers, and those who teach them, are interested in hearing about this.

        • Well I am not really all that aware of the literature either but I occasionally run into it.

          I suspect that teachers would be very interested in hearing about the research if it could be packaged and delivered in a useable classroom format. Most teachers would not have the background and certainly not the time to go from basic learning research to teaching material and teaching plan.

          From the one untraceable article I read it does not appear that the teaching/curriculum development profs are all that interested or knowledgeable.

          For an interesting and very different way to teach math to children you might be interested in looking at Jump Mathematics. Wh
          This is not out of a lab BTW, rather a serendipitous discovery by John Mighton, a mathematician/playwright. It seems to work though I’d like to see a bit more research. From the bits and pieces I have picked up about it, it seems to incorporate a lot of what we know about human learning. http://www.jumpmath.org/

          In contrast, from what I have heard about normal current practices in the US and here in Canada we seem to be ignoring some basic learning theory in teaching math these days especially in the area of distributed practice.

  4. What makes learning art and music better than science? I don’t see the fundamental premise here.

    What makes learning 5 languages more desirable than being good at understanding the physical world?

    • I think Andrew’s idea is that you can learn a lot of art and music and languages at age 5, and indeed you can learn it faster per hour at that age than at a later age. So spend those early years taking advantage of the fact that young kids learn that sort of thing especially well, and save until later the things that they can just as well learn later.

      Makes sense to me.

      • Also, there is the hypothesis that learning a second language helps cognitive functions generally; in particular, when you know two languages, you become very aware that there are things you can say readily in one language that you can’t say so easily in the other, so you don’t get trapped in the perspective of one language. (Have you ever tried to ask a bilingual five-year-old to translate from one language to the other?) And this presumably might transfer to other learning (e.g., the language of mathematics).

        • Interesting hypothesis. But is there evidence for it?

          Has anyone systematically compared bilingual kids’ math scores to others?

        • “Paap started looking into bilingualism in 2009, having spent 30 years studying the psychology of language. He began by trying to replicate some seminal experiments, including a classic 2004 paper by Bialystok involving the Simon task. In that task, volunteers press two keys in response to colored objects on a screen—for example, right key for red objects, left for green. People react faster if the position of the keys and objects match (red object on right half of the screen) than if they don’t (red object on left). But Bialystok found that twenty Tamil-English bilinguals from India were faster and more accurate at these mismatched trials than twenty English-speaking monolinguals from Canada. They were better at suppressing the location of the objects and focusing on their color—a sign of superior executive function.”

          20 bilinguals eh?

        • “Bialystok echoed his sentiments in an interview. For starters, the charge of publication bias is “utter nonsense,” she says. “Not every study coming out of every lab will get published. But is there insidious bias? Or suppression of relevant information. I see absolutely no evidence that there is.””

          Bialystok’s hunkering down response is the reason linguistics and psychology are in the mess they are in. She should have said: yes, this is a possibility. I will redo my key studies with higher power and see if I can replicate them, and my detractors should do the same. The problem is that money and reputation are on the line, and it is unacceptable that a researcher may have been barking up the wrong tree all this time.

          PS I must say that my son goes to a bilingual school here in Berlin, and I have noticed a significant improvement in his cognitive abilities over the last four years. Also, I speak 5-6 languages, depending on how you count a language, and we all know in this forum just how good my cognitive function is, I don’t need to say anything more. So it’s really possible there is a cognitive advantage. The question is whether p<0.05 or not.

        • I can’t resist adding this quote:

          ““Paap is not a bilingualism researcher and he doesn’t understand the field,” says Bak. “He thinks that if you have an experiment, you should do it everywhere and get the same result. When you have something as complicated as bilingualism interacting with so many variables, you’d expect varying results, depending on the circumstances and populations.””

        • Interesting link, Shravan, from the point of view of replicability, — but not directly relevant to the more specific point I was bringing up above, since it seems to be focusing on the different area of “executive function” So to elaborate on what I was talking about:

          My comment “in particular, when you know two languages, you become very aware that there are things you can say readily in one language that you can’t say so easily in the other, so you don’t get trapped in the perspective of one language,” was based on my own experience learning a second language in high school. I found the awareness that language can influence our thought to be an important part of my education.

          My speculation,” this presumably might transfer to other learning (e.g., the language of mathematics)” is something that I have often mused on as a teacher of mathematics.

        • I think learning a second language at high school level in the US is more like learning about languages and linguistics (a sort of mathy / computer science topic) than it is like learning two languages as a 4 year old.

          So, learning about language and its structure as a young adult could enhance math ability in a way that learning a language as a child wouldn’t.

        • Yes, agreed; knowing multiple languages furnishes many social advantages. I don’t really see how the language-math connection that you and Daniel mention could work. It feels like the same argument that you can improve general short term/working memory by doing some memory drills—those drills would only improve your ability to do those specific tasks, not lead to a general improvement in memory… Maybe you mean that the person’s general attitude to life changes: “there is more than one way of looking at everything”. Anecdotally, coming from a multilingual society where everyone is trilingual (at least), I didn’t ever see any evidence for that.

        • @Shravan

          My point was not similar to the argument on improving general memory by memory drills, but it also is not at the other extreme of “there is more than one way of looking at everything.” It was more like realizing that language can restrict our thinking.

        • I think learning a language in high school introduces you to the study of language as a construct, thinking logically about classifying things and how they fit together, which could for example make you better able to do certain tasks that involve logical processing of linguistic concepts and might carry over to something like parsing “math word problems” more carefully so that you can take a “word problem” and build an equation to solve for example.

          That being said, simply naturally “acquiring” a language as a 4 year old I don’t think would have the same effect (and you’ve mentioned this in some other comment on the page as well).

      • So is there evidence that 5th graders etc. have better acquisition skills compared to later in life for something like art than arithmetic or other subjects?

        I’m not even sure how people measure this.

        Just like it is hard to learn music or art later in life I’m sure its damn hard to teach calculus to an adult whose adding and counting skills were neglected in early life.

        • This is very well supported for learning languages. The breaking point is around 14 years old, after which it becomes very very hard to learn a language with native fluency. Andrew’s kids have native French fluency, whereas Andrew could spend the rest of his life studying French and never get to a fraction of their level of fluency.

          But, it’s very hard to do this taking Spanish classes at school in the US. You really need immersion. Andrew’s oldest kids spent two years going to school in Paris. They’d not be nearly as good if they’d taken two years of French class in NY.

        • Bob:

          My wife and I put our two kids into the Ontario French school system (4 and 6 years old) so that they would be in the Francophone community in Ottawa rather than in an Anglophone community doing French immersion (that also starts at around 5 years old). In Canada about 5% of the population is actually bilingual, in Ottawa its about 20% with about 20% speaking primarily French in the home.

          We think it made a big difference (my wife works mostly in French) but they still would get caught out as not having native French fluency (Quebecois). Not sure how spies succeed at fitting in – might require a special talent.

          And then in many parts of Europe a large percentage are multi-lingual…

        • Yes, but do we think that if you neglect arithmetic or something else very basic, then that can nevertheless be learnt well later in life?

  5. I think multiple choice questions get too much of a bad rep.

    Sure it is possible to have vague or subjective questions where the “best” answer is open to interpretation but in practice on well designed tests such questions are quite rare. Especially in the hard sciences.

    Of course deep understanding matters as brought out by essay type questions, numeric problems etc. but there’s nothing wrong with Multiple Choice Questions to drill and verify fundamental concepts.

    You can’t ever solve a deep, nuanced problem unless you have the simple fundamentals right in the first place. And MCQs do a decent job measuring that.

    • +1

      To which I would add:

      Multiple choice questions _can_ be designed to also test deeper levels of understanding than the basics. It is more difficult to do, and the expense of keeping people with that kind of expertise engaged in the process discourages the more widespread use of such questions.

      Test questions that genuinely have more than one reasonable answer (one keyed, the other based on a plausible “creative” interpretation) usually have a statistical signature: the people who get them “wrong” are not just the people who otherwise perform poorly on the test. Such questions can be identified in this way and removed from scoring, and then either eliminated from use or modified to eliminate the lack of a unique substantively correct response. Well curated item pools managed by teams of content experts and psychometricians can easily deal with this problem.

      That said, from everything I have read about it, the rush to use “value added measurement” to evaluate teachers is quite premature. The method is simply not ready for prime time. While I strongly support the concept of evaluating teachers in large part on how their students’ knowledge and skills develop, it seems that we still have a long way to go before we can adequately measure that.

      • RE “”value added measurement”

        From what I have read it is not so much premature but insane as currently implemented.

        It seems to assume the only variable affecting student achievement is the teacher. It seems to totally ignore socio-economic, native language spoken, family structure, and so on. And, IIRC, it seems to expect an improvement in student performance over some nebulous baseline. Of course if you already have a very high achieving class (say at the 90th percentile of some standard, it is going to be a bit hard to get a lot of improvement.

    • As someone who has worked as a psychometrician I don’t have any problem with standardized tests of which multiple choice test can be a subset, if used appropriately.

      The thing is that a good multiple choice test, or any other standarized test, needs to be well-grounded in content analysis, written (if a written test) at an appropriate reading level and properly piloted (there’s a lot more but that will do for a start). Except for a few psychologists who routinely run item analyses, I doubt that most MCQ tests in academia are analyized let alone piloted, but at least, they should be sound on content analysis.

      On the other hand, to measure some skills I have, in collaboration with subject-matter experts, written elaborate scenarios and hired experienced professional actors so that we could run quite realistic simulations. It’s a matter of the right tool for the job.

      I was reading an article yesterday in Diane Ravitch’s blog entitled “Rachel Rich: Everything you need to know about the /%#₤€% test about a test that seems to be being used in parts of the USA as part of the Value Added initiative (a misnomer if ever I saw one). https://dianeravitch.net/2016/05/25/rachel-rich-everything-you-need-to-know-about-the-e-test/

      It seemed to violate most of the principles of good test construction although since it appears that there may be no publicly available technical manual so who can know about some issues?

      However if the author of the piece (Rich) is correct, the test for Grade 3 is 7 hours long and is answered using computers. Duh, looks like total incompetence here. Asking a pupil in Gr. 3 to navigate software while taking an apparently stressful test is little short of criminal. Actually where I live one might have Children’s Aid staging a raid.

      • All good points.

        “I doubt that most MCQ tests in academia are analyized let alone piloted, but at least, they should be sound on content analysis.”
        The one area where I have direct knowledge of this is the examinations used in awarding board certification to medical specialists. These exams are all subject to ongoing item analysis, and the item pools are carefully managed to weed out poorly performing questions. New tests are piloted before being put into use, and new items in the recurrent exams are initially used on a non-scoring basis until their psychometric properties can be assessed. In, fact the individual specialty boards are required to do this as a condition of maintaining their authorization (from the American Board of Medical Specialties) to certify specialists. The National Board of Medical Examiners also follows these practices in developing, maintaining, and administering the US Medical Licensing Exam.

        I don’t know if you consider these things to be part of “academia” or not.

        • I don’t know if you consider these things to be part of “academia” or not.

          No. This is professional certification.

          I was thinking in the more limited area of instructor built and administered instruments for tests and examinations within a academic course.

          I would expect any well-run certifying boards to do what you describe though it is good to hear that they actually do so.

      • Part of the issue is … they put them on computers to make it possible to grade a high volume of students quickly and at low cost. If you were to use this kind of thing to assess a school or teacher, better to take a random sample of students and assess their work carefully and in depth. In fact last year in my department we have one class with sever hundred students a semester where everyone writes a research proposal and last year we randomly selected a set to evaluate. It was scary because what if the ones selected from your sections were all terrible … but it actually was really great and helped us understand one thing in particular that was not getting through to students in all of the sections. But this is much harder work. And yes you can write a great multiple choice question too.

        • I think there’s too much focus placed on testing against “requirements” Just design one massive adaptive testing system, and grade each child on a continuous scale, from say 0 to 50 where 0 is “worse than a random number generator selecting multiple choice questions” and 50 is “top candidates to graduate school”

          A pool of 5000 questions should be able to bridge that spectrum pretty well, estimate their skill level using a prior based on typical for their grade level, and use Bayesian updating at each question. Continuously ask them questions within a small margin of your current estimate of their skill, after 100 questions you should have a pretty tight posterior distribution over their skill level I’d think.

  6. Since nobody has yet mentioned this, what about the research showing that power poses improves teaching outcomes?

    Seriously, I think the points about multiple choice tests is somewhat irrelevant. Sure, you can create “good” multiple choice tests, but unless you permit test-takers to use books and other resources and take the tests without a time constraint, you are inevitably measuring a particular type of knowledge and thinking. I’ve been teaching for 35 years, and not given a multiple choice test for the last 34 years. If you are going to give one, leave a space for students to explain their answers. When I did that, I was surprised to learn that students with the wrong answer often had better reasoning than those with the right answer. If you believe learning is multidimensional and individual (which I do), then it is unlikely that standardized tests will be adequate (as one recent reference, take a look at the book by Todd Rose, The End of Average).

        • Yes, but if you assume some stability of the teacher effect through time and you are tracking individual students through time you could tease things apart. You have to have realistic models though. Looking at averages across classes is a joke, you need details of each student and you need things like individual knowledge of the student’s background and socioeconomic status and parent’s education etc to figure out how teacher ability interacts with student ability and soforth.

        • “if you assume some stability of the teacher effect through time”

          One would hope that there is improvement in teacher effect through time!

          “you need details of each student and you need things like individual knowledge of the student’s background and socioeconomic status and parent’s education etc to figure out how teacher ability interacts with student ability and soforth.”

          And the “soforth” includes details like what the class was (e.g, you need to distinguish between Algebra I for eight graders, Algebra I for students who are taking it for the first time as ninth graders, “repeater Algebra I” (i.e., for students who failed it the first time) and Honors Algebra I. And you need details such as the time of the class (that can make a difference) and the physical classroom (details such a classsroom arrangement, equipment, acoustics, lighting can make a difference), class size, composition of class (the presence of some particular students can make a difference), the students’ previous teachers, and probably more that neither you nor I have mentioned or thought of.

          So the upshot is that there are so many variables that noise is likely to overwhelm the possibility of detecting any credible “teacher effect”

        • “One would hope that there is improvement in teacher effect through time!”

          Yes but one would hope that this is a slowly varying function of time, fluctuations on a timescale of say 3 to 5 years not 3 months for example.

          I agree with you about the detailed picture. I think it’s easier to assess this stuff between say 3rd and 6th grade where typically you have 1 class not changing through multiple teachers. When it starts to get into specialty teachers by topic (Algebra, Chem, English etc) then you need to be careful about details of the student history.

          While I agree that there can be significant effects of the physical classroom environment one hopes that they are not primary effects. if the main thing keeping your kid from learning algebra isn’t the ability of the teacher but the dimness of the old fluorescent bulbs… you probably have problems that can be detected and fixed without fancy tests. (but, with dollars that schools may not have). At least some of those effects might be common to teachers within a school and so could be partially accounted for.

          The question is, are teacher effects really pretty small, so that with a good set of details on students it’s still hard to detect them? Or, can they be big? So that the “best” teachers in a district are really a lot better than the worst? That’s an empirical question that could be addressed by an honest modeling and experimenting scenario, if it weren’t so BS and political as the real system we have for value added.

  7. One point that seems to be entirely missed here is that without these standardized testing methodologies, school districts will just declare (or fake) success. Does anyone remember those stories in Time or Newsweek from the 70’s showing a large 18-year-old (usually black) student behind an elementary school desk surrounded by 6-years-olds in their desks. And the accompanying article would describe how the pictured student was suing his school district because he had been promoted every year and given his degree and he wasn’t able to read? And the 18-year-old was of (at least) average intelligence and quite articulate but could, alas, not read. I’m not saying that standardized testing is the answer but there needs to be accountability past someone checking a form and sending it into the Board stating that 90% of the students are above average (and don’t forget the jail terms handed out to administrators across the nation for simply forging test results).

    • Numeric:

      I’m a big believer in standardized testing. I think one reason college statistics classes are so crappy is that we have no measure of success, it’s not really clear what our students are supposed to learn. That said, if we did have standardized testing for college statistics classes, you can bet I’d be screaming about it on a regular basis, cos I know the standards would include all sorts of horrible practices based on p-values, etc.

  8. And learning does not require teachers, schools, tests, nor formal government institutions.

    Most of what we know was learned outside of schools.

    Despite widespread beliefs to the contrary, education & schooling cannot be conducted in some objective, sterile process. In America’s vast institutionalized education system… Someone must always decide what will be taught and how it will be taught…. these are always highly subjective/arbitrary, human decisions.
    Who should make those fundamental decisions for you and yours ?

      • If the language they learn is not being spoken around them outside of school, it’s probably a waste of time. It seems unfair to a child in elementary school to make them learn something mechanically the way an adult would. This is a very different situation from acquiring language naturally pre school. To expect a child to learn a language through school lectures seems crazy to me. Next year my son will start French, which will be a true foreign language for him (unlike English and German), but this happens in class 5, not in the early elementary school years. He will have to do two years of French, and if it were not for the fact that my wife and I both speak French and go to France often, I think that even at 11 he wouldn’t be able to learn anything useful in class lectures.

        I used to live in Japan in the 1990s, as a student, and used to teach English to elementary school children to earn some money. With English spoken only in the class, it didn’t bring anything useful in the long run. And it’s not just because I wasn’t an expert teacher; Japanese kids get a lot of English in school, but they don’t learn enough to be functional (this has changed since the 1990s, I meet a lot more fluent English-speaking Japanese, but the majority seems to be still stuck in a non-English speaking universe).

        Some of these children I taught long ago are grown up now and have children of their own, and when I meet them when visiting Japan, they still can’t speak any English. I think that’s because it is just not used around them much.

  9. Here we are discussing great concepts of learning, and yet the blog begins with a jarring typo:

    “My wife is a 5th great teacher, in Texas…”

    Clearly, Middleton meant “grade” instead of “great” but should this reduce his argument in the eyes of those he wishes to influence? If not, why not? Is a mistaken homophone trivial or does this undercut his plea for better educational practices?

  10. But if you for arguments sake got rid of MCQs and standardized tests you’d replace them with ad hoc evaluations by individual teachers which hardly do any better by the yardsticks of proper piloting, content analysis, validation etc.

    I think people hold up MCQs to an arbitrarily high standard which its alternatives don’t have to play by.

  11. If science is all about reading from a text book and getting graded on an essay/test then I’m all for giving that a miss.

    But if science is about watching what happens to tadpoles, butterflys, seeds; keeping track of the moon or weather and measuring rain/timing sunrise; playing with magnets and iron sand, lenses/mirrors and periscopes; snap lock electronic boards; building things that move and then measuring how far they travel, how long it took; building structures and seeing how they break; Baking bread, making hockey pokey (a sweet with key ingredients vinegar and baking soda) e.g. learning algorithms. Dropping things off buildings or ladders and seeing what happens. Seeing what things float. Going on field trips to see how things are made, how water is made clean, what happens to waste.

    If Science is all those kind of things then I am all for it. (These are things that probably happen in all our homes anyway *but* for many kids these aren’t things they get exposed too.)

    Kids come in all kinds – learning a second language as a kid is going to help them learn more languages as a teen/adult but kids who aren’t that literate aren’t going to get much joy out sitting down and learning 5 languages. And those are probably the hands-on kids who would like to see, do and make “real” things i.e. science/engineering.

  12. As a father of a 10 year old, my firm conclusion, based on n=1, but p<0.05, is that kids in elementary school need to have fun above all. Most of the real teaching happens at home.

  13. @mpledger @shravan I had a fantastic science teacher in 1st and 2nd grade in the Detroit Public Schools. Mrs. Meyers. And she did exactly what mpledger said. And my math teachers weren’t half bad—they let me work ahead at my own pace and answered the occassional question. Wish I could say the same about teachers in junior high and high school after my family moved to the suburbs (though I did have an awesome physics teacher and a teacher who managed to teach a little programming with mark-sense punch cards and a mini-computer hooked up through a phone modem).

    My mom was an English teacher, so I learned a lot from her, too, particularly how to do research on my own. But I probably would’ve never gotten so interested in science without the great intro in school—my parents simply don’t do math or science. Not that they’re opposed to it, they just don’t know anything about it. Mom and dad took me to a model rocketry club when I got into rockets in 3rd grade, where I could spec out engines, fins, and nose cones and build them from parts, then paint and launch them. Exactly the kind of practical stuff mpledger was talking about.

    I would not have wanted to learn five languages. And I hated music class. I never wanted to learn a language at all in school—there was always something more fun to do, like electronics, TV and radio production, computer programming, gym, independent studies, and shop. Ironically, I got a Ph.D. in computational linguistics (though without ever learning a language other than English). Shop was great in the lower income Detroit suburbs in the 1970s — we learned woodworking (like bandsaws, jigsaws, fiberglass work, lathes, etc.) and everything from spot welding to aluminum molding in metal shop, and even small engine repair (completely took a lawnmower engine apart and put it back together gapping the spark plugs (accurate small measurement), adjusting the head gaskets (torque wrenches), tapping stuck bolts, using a micrometer to measure replacement parts, etc. Great practical stuff that’s fun for a 12 year old.

    • Looks like you had a great educational experience. Up until grade 3, my son had a pathetic math teacher, she’d be absent for weeks without any explanation (and no math would be taught). When she taught, she spent all her time berating the “lazy” students and prescribed rote work. The math textbook is actually really great, but she didn’t do the exercises provided there with real objects.

      My own math education in Delhi was pretty lousy too. Statistics in particular was a joke; memorized formulas and incorrect proofs. It was only in 2011, in my late 40s, that I really started to get interested in cool things like linear algebra and “continuous math” (can one say that?) via statistics.

  14. “The kids are going to school anyway, they might as well learn a few foreign languages, that’s my attitude!”

    And that — “The kids are going to school anyway” is exactly the point.

    Compulsory education is precisely designed to force other peoples’ subjective values & educational choices/methods upon the general population. That authoritarian political approach to learning is incompatible with a free society, but unfortunately is blindly endorsed by most people.

  15. Learning a second language is a lot of work and must be maintained in order to be utilized. English by itself is hard enough. ESL students tend to struggle in school.

    Value-added is being embraced, I think, partially as a means to fire teachers. It’s easier to fire teachers if the administrators can claim to be objective in their methods. The alternative is not having a means to get rid of any teachers.

    I think the biggest issue in education right now is a lack of tailored instruction. All students are trained in the same way irregardless of current skill set or learning rate. Under such a system, a few students are going to fall hopelessly behind, and others are going to be perpetually held back. Improvements from good teachers are going to disappear after a few years.

    There have been improvements in special education with the caveat that I’m not aware of any recent RCTs lasting more than 5 years. If there are effects for special education, why not students just barely high functioning enough to be classified as normal?

    Similarly, there has been a generational increase in IQ, and this effect has tended to concentrate among individuals at the lower end of the intelligence spectrum.

    Learning outcomes tend to be highly specific. You learn the thing you’re taught, and it’s difficult to work even slightly outside that area. Generalization gradients are a good way to visualize this, I think. The less similar an object is to what was originally taught, the less likely an individual is to be able to respond appropriately. There is a ton of experimental research in concept formation trying to explore the nature of this issue. The evidence that music training makes people smart is weak.

    Good teachers appear to be associated with increased earnings even though their effect on test scores eventually disappears, which is weird, but it happens. It’s suggestive that either there’s an innate trait being overlooked, or there’s an effect of good teaching which isn’t being measured , but is correlated with briefly higher test scores.

    There does appear to be an effect on GDP of increased education spending.

      • I think that widely held belief is overrated. Consider the fact that a kid is devoting almost all his day to language acquisition & is essentially “immersed” in the language all the time and in spite of that after 4-5 years is still speaking in “kid-speak” with a very limited vocabulary.

        How many adults do you know who gave up everything to learn a new language full-time for 4-5 years in a totally immersive environment?

        • I disagree with your assessment of 4-5 year olds. My 4-5 year old kids talk about the role of siphonophores in the ocean ecosystem. Kids at 1.5 years old are very different from kids at 3 years old, are very different from kids at 5 years old.

          That being said, kids acquiring more than one language are well known to take longer at pretty much everything at least for the first say 5 to 6 years of life.

        • Daniel said “kids acquiring more than one language are well known to take longer at pretty much everything at least for the first say 5 to 6 years of life.”

          Reference?

        • Good question. I may be overstating my memory of this topic. My half-siblings were bilingual and so my family had done some research on this, but it was maybe 20 years back, so the thinking on this may be changing. As I remember there was typically a delay, but it was also typically smallish (say 6 months) and generally didn’t have any long term adverse consequences. Since it was so long ago I don’t have any specific references. Perhaps someone else here has a better handle on this topic.

        • I have no reference, but I have heard the same observation from several teachers. Parents of bilingual kids have to be reassured that it’s not that their kids are slow but its just the additional cognitive load.

        • @Rahul:”I have heard the same observation from several teachers.”

          I am wondering whether there might be a confound with SEC here: If the teachers are thinking of their experience with bilingual students from poor families, or if there are studies that show the effect independent of SEC.

          (This is partly prompted by an anecdote: A professor came to visit here for a year; the school asked what his children’s primary language spoken at home was. It was not English, so the kids were put in the bilingual class. But that class was for native Spanish speakers; these kids spoke an Indian language at home, and were fluent in English, which was the language in which they had been schooled in England. But the father thought maybe it would be good for them to learn Spanish.)

        • Rahul:

          5-year-olds learning a new language don’t “give up everything” or “devote all their day to language acquisition.” They just live their everyday lives and learn the language as extra.

      • “Learning a second language is a lot of work for an adult.”
        My understanding is that if someone learns a second language when young enough, then learning a third as an adult is much, much easier than for an adult who has not learned a second language.

        “It’s easy for a 4-year-old.”
        My nephew lived in the Netherlands for nine months when he was five/six, and at the end of that time was speaking Dutch like a Dutch kid his age. I don’t think he had any instruction — just playing with the other kids in kindergarten and playing soccer in the parking lot. (But he couldn’t translate from Dutch to English.)

        • Personally, I grew up bi-lingual, then started learning my third language (English) in school at the age of 6, my fourth language at 10 (Hindi), my fifth language at 13, & then German my sixth at 16. And currently, I’m informally picking up a seventh due to where my work is, but that’s not going too well.

          Perhaps it makes sense that the last two languages I acquired are the ones I’m crappiest in. But yes, anecdotally, I think I had a far easier time picking up German than kids with me who had never studied a second language before. Some of the languages being cognates made it easier.

          But I’m not sure how much I buy the young-kids-learn-languages-specifically-well hypothesis because I suspect I’d be struggling as hard if I had to learn calculus from scratch in my 30’s.

      • I tend to make a distinction between “learning” a language (an agentive act, requiring volition) and “acquiring” it (an automatic process that happens when a child is exposed to a language around them). So I wouldn’t say that a 4 year old is learning a language when they are just surrounded by one in their environment and are acquiring it that way. After all, you won’t see your kids poring over conjugation tables of French verbs.

        • OK, I see the distinction. My own experience fits into your category of “learning,” but my nephew’s into “acquiring”. Still, I don’t recall poring over conjugation tables of French verbs — I just memorized (repeating in my mind or out loud) what form of the verb went with what pronoun. It was somewhat like learning songs or poems (which actually was part of learning French). I also remember memorizing sentences. But those things then got patterns into my brain which facilitated generalizing to other verbs and other aspects of grammar.

    • Was waiting til I got to a computer so I could easily just grab the links, but some citations:

      Scott Alexander on Value-added measures, earnings effect of good teachers, and how ESL kids struggle in school: https://slatestarcodex.com/2016/05/19/teachers-much-more-than-you-wanted-to-know/

      Effects of modern special education interventions relative to alternatives: http://www.cdzjesenik.cz/Autismus__complex_trerapy_fulltext_pdf.pdf

      James Flynn on IQ: https://www.ted.com/talks/james_flynn_why_our_iq_levels_are_higher_than_our_grandparents?language=en

      One example of how learning can be highly specific: http://gcpsx.coeps.drexel.edu/mted601/week3/carraher.pdf

      One analysis of the relationship between education spending and GDP: http://www.appg-popdevrh.org.uk/Publications/Population%20Hearings/Evidence/IMF%20report%206.pdf

  16. Value added models are problems for a lot of reasons but the point is that they control for income,race, gender, incoming test scores, and a lot of other variables. Not saying it’s perfect or even very well done, but when they did it in NYC there was an uproar among the high income Manhattan parents when they found out that teachers in the Manhattan Gifted and Talented schools had low value added (when of course that makes sense to anyone who thinks about it, all those kids born on third base) and lots of teachers in low income schools turned out to be having a large impact (again, if your parents don’t speak English and didn’t graduate from high school, attending school is high impact, there’s lots of room for impact).

    http://www.nytimes.com/2012/02/25/education/teacher-quality-widely-diffused-nyc-ratings-indicate.html?_r=0
    http://www.wnyc.org/story/301783-teacher-data-reports-are-released/

    This too was exciting news
    “At a briefing on Friday morning, an Education Department official said that over the five years, 521 teachers were rated in the bottom 5 percent for two or more years, and 696 were repeatedly in the top 5 percent.”

    “In New York City, a curve dictated that each year 50 percent of teachers were ranked “average,” 20 percent each “above average” and “below average,” and 5 percent each “high” and “low.” ”

    http://www.chalkbeat.org/posts/ny/2012/02/29/why-its-no-surprise-high-and-low-rated-teachers-are-all-around/#.V1MChZMrKqA
    http://hechingerreport.org/the-worst-eighth-grade-math-teacher-in-new-york-city/

    • The appropriate measurement is not “value added” but “value added as a fraction of (or maybe difference between) expected value added predicted by observed class population incoming attributes”

      Then, if your “3rd base” kids come in and gain 3% on a standardized test score and you would have predicted that they’d gain 3% from an average teacher, then your teacher is average… if they gain 5% then the value added is 5/3 of an average teacher, and if they gain nothing it’s -3/3 or whatever, think it through so that the scale depends on what the expectation is *for that class*.

      or at least that’s the basic concept. it should be probably be adjusted further to deal with the compression towards the top of the scale (you can’t start out answering 99% of the questions correctly and go up from there very much)… though I’d argue that since the kids are on computer tests anyway you should adapt the test upward and downward towards different grade level material to give a better sense of not whether the kids are “meeting fixed requirements” but overall how well they’re doing on a scale that goes from “illiterate and unable to speak or do any math” up to “in the top 1% of GRE scores”, I mean, why not? as long as the test is adaptive and you’re not wasting time asking them GRE questions when they aren’t ready for them or asking questions like “what is the name of the letter that starts the word Apple?” when they’re performing at a typical 7th grade level etc.

      • But don’t you still end up with the problem that teachers start hoping that the previous grade teacher was crap (or at least not stellar)? That is, if your new students were taught horrible in the previous grade in a particular subject (say math) you have a lot more room for improvement. On the other hand, if the previous grade teacher was stellar and managed to get the pupils well beyond where they would have been with a normal teacher when entering your class you’re screwed because it becomes far more challenging to improve their scores on the standardized exam.

        • This becomes far more of a problem when you’re measuring averages across the class, rather than tracking individual students, which is the appropriate method. To do this right I think you need to track every individual student’s progress, and see how that progress was or was not helped by the teacher they had relative to what you’d expect for a student with those incoming skills. You can then average the *degree of helping* over the individual students to get a summary provided you’ve put those onto a common scale.

        • Put another way, think of some measure of knowledge and ability, call it S for student skill, then presumably S is a function of time t. And, presumably S is affect by some kind of learning at time t which comes from several source, including their teacher.

          On an appropriate scale, you could imagine a differential equation for S

          dS/dt = A(t, Lt(t) + Lh(t) + Le(t))

          where A(t,L) is ability of the student to respond to the learning they experience at a given time (a nonlinear function in L that probably saturates at the high end, and is also affected by things like home stressors, nutrition, physical environment of the school, social environment of the school etc), and Lt is teacher based learning, Lh is home based learning and Le is experienced based learning from other contexts.

          Now, we evaluate S at various time points (standardized testing days) which includes measurement error (problems with test design) , and variations due to unimportant short term issues (like whether the child had a good meal for breakfast, whether they had a cold last week, whether their friend told them they were stinky that morning… etc) Which constitutes put together measurement errors.

          so, we have an unobserved state S(t) which we measure at a few time points with measurement error. We measure it for multiple students who all have the same Lt for a year but who have very different A, Lh, Le functions, and we try to infer something about Lt and its interaction with the A functions for the students the teacher have been given.

          We can learn something about A for each student by looking at the students through time, we can learn something about A(,Lt) by looking at the teachers across multiple students.

          And A(,Lt) is all that matters, another way of saying this is there’s no “absolute” level of teacher ability, only how well does the teacher adapt to what the actual students need in order to learn. But across very different contexts, say the difference between fancy upper-class Manhattan schools, and poor Brooklyn schools… the teaching skills needed are very different and it’s inappropriate to compare them between teachers.

          The goal should not be to “maximize S” for the students, but rather “maximize dS/dt”. S is determined by two things, an initial condition S(0) at entry to school, and then a combination of their Ability and Learning A(,L). Since A saturates at the upper end, no matter how brilliant a teacher is, they can’t make students learn faster than their saturated ability. For students with low ability saturation levels (for example because they’re abused, poor, have bad nutrition, have limited language abilities due to ESL issues, etc) a lack of large dS/dt doesn’t indicate a lack of teacher ability. If you don’t track students you can’t estimate an A for each student, and you can’t tease out the differences between bad teaching and low peak student learning rate.

        • I don’t know if you’re being ironic or not, but this is pretty close to literally what the value-added literature does. You’re describing known stuff.

        • I’m glad to hear it, but based on conversations with teachers and administrators that wasn’t my impression of how things are actually implemented. It’s a different thing to say that some academics have published how to do things vs some business administrators are actually doing those things.

          Certainly, I’ve looked into the california school rating system scores (API) and found that it’s a worthless or even counter-productive system. So I don’t have much faith in how things are being done.

Leave a Reply to Elin Cancel reply

Your email address will not be published. Required fields are marked *