SAT stories

I received a bunch of interesting comments on my blog on adjusting SAT scores. Below I have a long comment from a colleague with experience in the field.

But first, this hilarious (from a statistical perspective) story from Howard Wainer:

Some years ago when we were visiting Harvard [as a parent of a potential student, not in Howard’s role as educational researcher], an admissions director said two things of relevance (i) the SAT hasn’t got enough ‘top’ for Harvard — it doesn’t discriminate well enough at the high end. To prove this she said (ii) that Harvard had more than 1500 ‘perfect 1600s’ apply. Some were rejected. I mentioned that there were only about 750 1600s from HS seniors in the US — about 400 had 1600 in their junior year (and obviously didn’t retake) and about 350 from their senior year. So, I concluded, she must be mistaken.

Then I found out that they allowed applicants to pick and choose their highest SAT-V score and their highest SAT-M score from separate administrations, and so constructed their 1500. I stopped talking at that point, deciding against discussing the probability of throwing snake eyes if you cold throw dice many times and pick out a one from one toss and the other one from another.

My other colleague sent in the following thoughts:

1. I [my colleague who has worked in this area] thought that the result of the coaching studies was that they did not show significant improvement when the control was serious self-study (e.g., going through ’10 Real SATs,’ a relatively inexpensive publication available from the College Board. Thus, the money spent on Kaplan and Princeton Review is for somebody to encourage you to do the work. I can’t put my fingers on exact studies, so I could be quoting myth or misquoting the actual work.

2. The approach I [my colleague] has always favored is to set the cut point for the test scores (SAT, GRE) relatively low (although not so low that you are admitting people who are unprepared); big enough that you get half again as many applicants as you have vacancies. Then look at other factors in statements and background to create the incoming class. Diversity could be one of those factors. (So could prejudice and discrimination, though).

3. One of the roles that the SAT plays is in helping to “equate” the high school grades which reflect local grading practices.

My quick responses:

1. According to the famous 8-schools study (reproduced in chapter 5 of BDA), the effect of coaching is less than 10 points. Ben Hansen reports estimates of about 20 points. And, indeed, I’ve heard that the effect of coaching is about the same as the effect of spending X hours studying the material.

2. The low cut point idea could work. I certainly think this can make sense in graduate admissions. Setting too high a cut point on GRE can rule out some potentially excellent applicants.

3. Yes, I agree. That’s why I suggested adjusting the SAT rather than abandoning it.

16 thoughts on “SAT stories

  1. This (low minimum cutoff) seems the right approach. Otherwise, you'd admit a lot of people who just test well, and exclude many better candidates who don't.

    Parenthetically, the first anecdote fits well with the thirdhand impression I have of the Harvard culture.

  2. On your colleague's first thought, I have been inclined to think that in general one of the biggest reasons people pay others to teach them is in order to motivate themselves to put in effort by making themselves accountable to someone.

    I found Harvard's method of determining scores from multiple tests positively scandalous. The only reason I can think of for it is to inflate the number of 1600s they can claim for marketing purposes.

  3. I had the impression that that is primarily how the GRE is used, though perhaps not the SAT as much; programs will look for a score above maybe 700 on math and 400 on verbal and, unless you're close to the cutoff, it basically becomes a binary variable.

    In general, I don't think adding or subtracting points, as you suggested in the other post, is as good an idea as attempting to acquire that information and allow different schools to use them as they see fit. The best schools will use them less formulaically than a simple additive formula.

  4. If I remember properly from my GRE results in 2002, a quarter of those who took the GRE received perfect scores on the quantitative section, and half of takers who majored in a quantitative science or engineering received perfect scores.

  5. Andrew,

    Your comments on how to adjust SAT scores sounds like something published 50 years describing a future dystopia. Proposals such as subtracting points from a students SAT score for attending private school or taking a commercial preparation course sounds like some kind of satire. In fact I think the whole post is not meant to be taken seriously. But just in case it is; I offer the following.

    1. What's the point of a university system? Is it not to educate students with the most potential regardless of where they come from an who their parents are. Or is really supposed to be some kind of social program to elevate the underclass?

    2. Being born with a high IQ is certainly an advantage. You will score higher on all heavily g-load tests such as the SAT. Do you somehow think this is "unfair." Of all the factors that explain the variation in SAT scores, I suspect that IQ explains more of the variance then anything else. Surely we have good research on this. If my assertion is true, then are you suggesting we penalize people for having a high IQ?

    3. In the spirit of Swift's Modest Proposal, why don't we levy a special tax on university professors salaries to pay for SAT coaching for the "disadvantaged?"

    4. Let's go back to private schools. Do you really think parents have a duress-free choice? If they live in an affluent suburb, then they don't need a private school. If they live in a place like Chicago, then they either send their children to a chaotic and dangerous public school or pay for a private school. Thus they have to pay twice for schooling. Now you propose to levy a third penalty. Note that virtually all our politicians avoid sending their children to the DC school system. They either move to places like Oakton Virginia, or send their kids to places like St. Albans. You seem to think that they should be punished for doing so.

    5. If any of your proposals were implemented, then people would game the system. Instead of hiring commercial outfits like Kaplan, they would opt for a private tutor. Where do you think the Kaplan instructors come from? From high scoring test takers. Put Kaplan out of business, and the instructors would go private to meet the new demand. This is a market with virtually no entry barriers, and no sunk costs. Just put an ad on Craig's list for "confidential" SAT tutoring.

    Finally to repeat myself I think the whole post was satirical.

  6. This is a big subject and I'll probably be commenting in drips and drabs but I had to get this point off immediately.

    I don't want to slam someone for log comment but the faulty reasoning here is Howard's, not Harvard's. Here's why.

    A die has a uniform distribution. Each SAT subtest has a heavily (and I do mean heavily) skewed normal where we are primarily interested in the mode. It is difficult to score that much above that mode but because time is so important, it is easy to score far below it.

    Given this distribution and the limited number of opportunities to take the SAT, the pick-the-best approach is a good way of getting a representative score.

    I have more on this at Observational Epidemiology. Do a keyword search on SAT.

  7. Zarkov: If you take the SAT twice, it's likely your score will go up. If you take the SAT with extra time, it's likely your score will go up. If you get coaching, it's likely your score will go up. I don't see any of these as evidence of higher "g"; rather, I see them as factors that will artificially inflate your score. Nowhere did I recommend penalizing kids who have a high IQ.

    Mark: I'll have to think more about this. My quick thought is that taking the best scores, as Harvard does, throws away information, and that it would be better to take an average. But maybe there's something I'm missing here.

  8. I don't see this could work…if I were an admissions officer looking at the student's portfolio, I would just say, oh he got 1210 SAT but he went to a private school so he really got a 1260…etc etc. For this to work the admissions would have to be prevented from matching the score to the student characteristics used to adjust the score, which doesn't make sense.

    Unless the weighting formula were kept secret, but then it would only take a couple years to figure out what the penalties were and we would be back at square one.

  9. Cody must be talking about Harvard applicants :-)

    The Wikipedia report is in line with what I remember when we did admissions at CMU, where a perfect (800) score in math puts you in the 94th percentile:

    http://en.wikipedia.org/wiki/Graduate_Record_Exam

    Still, that's not very discriminative. I certainly don't remember the GRE math section being at all challenging.

    One problem is that because there are word problems, language skills bleed over into the math scores. If you put the bar at a perfect math score, you won't reject many qualified native English speakers. But you might very well reject non-native speakers with better math ability because they're not as competent in English. I wonder if you could adjust the math scores based on TOEFL and English GRE scores.

  10. If test-to-test variation is low, then the information loss is low. (Although taking the maximum rather than the average is just silly).

    The real problem is that this encourages people to waste their time and money retaking the test, and gives another advantage to families with time and money to spare.

  11. Andrew,

    1. You proposed subtracting SAT points from students who attend private school. That would have a disparate impact on higher IQ students. We know income and IQ are correlated (see Jensen's, The g-Factor). We know families need a higher than average income to afford private schools. We also know that IQ is heritable. Again, see Jensen for details.

    2. I suspect the average of fixed-time SAT scores would provide the best estimate in the sense of predicting school success. In any case, this is a technical problem that I suspect has been solved. Why not just go with whatever ETS recommends?

    3. No doubt practice will raise a student's SAT scores a little. You can practice by yourself or hire a tutor. Why do we need to punish students who practice with a tutor? What is to be gained?

    I guess your proposals were serious.

  12. Presumably you need a high IQ to do a high level job. If you do a high level job then you earn lots of money and can send your kids to private schools.

    So are the high (average) scores of the private school kids the result of the private school or genetics? And it it's a little of both how do you work it out (in an ethical way)?

  13. There is a good reason not to average the scores from (or, as you suggested in your earlier post, penalize students for) multiple SAT sittings: It gives an unfair advantage to those who took the ACT multiple times and submitted only their best scores–not an option for the SAT. It also may not have the effect you are going for, because poor students can take the SAT twice for free.

    I can't find any evidence of "an option to take as long as you want" on the SAT, and it would be absurd if there were. Students with documented disabilities that require extra time for tests in their high-school classes can apply for extended time on the SAT and ACT. But colleges have no way of distinguishing these scores from those achieved under standard timing conditions, and obviously docking them would harm rather than help disadvantaged students.

  14. Let me try to elaborate. There are a lot of moving parts here so though this may be a bit disjointed I'm just going to list these in the order I think of them.

    In analogous situations where occasional but severe underreporting is suspected, there are obvious concerns with using the mean rather than the median, but since here we are frequently faced with sample size 2, the median doesn't do us much good.

    Let's say we can assume that when the variable is not underreported, its variability is relatively small. You can then make the case for the larger of the two numbers being a better estimate for the population median than the mean would be.

    Now look specifically at the SAT. All of the following applies to the math section. Most of it applies to the verbal section. None of it applies to the essay section (which is a train wreck).

    [http://observationalepidemiology.blogspot.com/2010/05/how-to-ace-essay-section-of-sat.html]

    The SAT is all lightning round. The questions are all on a ninth grade level but they are presented in tricky ways* and thrown at you at an alarming speed. If you break your concentration or lose track of time you can easily see your score drop one or two hundred points (bye-bye Ivy League). Any supporter of the SAT (and I am very much in that group) will admit that it's a tremendously artificial situation.

    Think about the sources in intra-student test scores. They basically fall into two groups, The first consists of things like guessing or getting a question you're good at. These things are well behaved and well understood and are as likely to raise a score as to lower it (remember, on the SAT the expected value of a guess is zero). The second group consists of things that keep a student from getting credit for answers he or she knows. These include discomfort, distraction, poor mouse skills on computerized tests, skipping a line or picking a bad box of pencils on the paper test, losing track of time while focusing on a problem or simply getting flustered.

    It's true that you lose information when you drop the low score (you lose information whenever you go from a vector to a scalar, even if it's the mean), but I'd argue that the high score is a better (though biased) estimate** of the mode and the information added by the low score is often of little interest.

    There's room for disagreement here but when you factor in the need for simplicity, Harvard is definitely taking a reasonable position.

    * I've actually seen an alphmetic in an SAT question.

    ** To paraphrase Mr. Twain, I'd prefer the biased estimate of a watch that's five minutes slow to the unbiased estimate of one that's stopped dead.

  15. The primary purpose of the SAT is to predict how you'll perform in college. I'd argue that keeping the highest score explicitly helps towards that end. If I get a 1400 based on natural aptitude: no retakes, no studying, does that make me more likely to graduate with good marks then someone who scored 1200,1320,1300,1400,1330, taking a prep course and taking sample tests on their own? Has anyone measured that?

    To my line of thinking, they're fairly equivalent situations. I've demonstrated ability, but not necessarily motivation. The other student spent 4 Saturdays taking a test, and other hours preparing. They've demonstrated less natural ability than me, but they seem more likely to put in hours studying and doing homework to get a good grade. If they got a tutor for the SAT, maybe they'll get a tutor for Calculus too.

  16. I don't really understand the cutoff idea with GREs at the graduate level. I'm actually not quite sure why GREs are relevant at all at that point. I had a friend at university who was, I almost want to say allergic to math tests. She chalked it up to some childhood experiences, I can't really explain it, but she could get As in advanced statistics courses, and had no problem getting an 800 in verbal, but her math was something like 350. Bear in mind that she did plenty of statistics in her undergraduate work and research and was always successful. She just choked up when she had to take a simple math test. She got into U Toronto's extremely selective clinical Psychology program (they don't require GRE scores), and is now successful and productive.

    I think when you see an applicant who has a single terrible GRE score when everything else about them checks out, it's apparent that there was something wrong with that measure in particular, rather than something wrong with every other measure.

Comments are closed.