Taking a standardized test multiple times

When we have a grad school applicant who’s taken the GRE or TOEFL multiple times, we typically just look at the highest score. It’s my impression that pretty much everybody does this, even though basic statistical principles would suggest taking the average. Eric Rasmusen reminded me of this point in the context of the SAT, which apparently has changed its policy to encourage multiple test-taking even more, by allowing students to report only their highest score. Throwing away information–that doesn’t sound like a good idea! But, as Rasumusen points out, it might make more money for the organizations that administer the test.

According to the linked news article, students “will have the option of choosing which scores to send to colleges while hiding those they do not want admissions officials to see.” My question is: will their score report state whether they’ve chosen this option? If so, it should be possible to at least try to correct for the bias.

In any case, all this discussion makes me think we should be more careful about just looking at the maximum when our applicants take the GRE multiple times. And then there’s the possibility of cheating. . . . I guess the real lesson is that these admissions decisions aren’t going to be perfect, and we should think more about how to incorporate this perspective into our admissions process.

16 thoughts on “Taking a standardized test multiple times

  1. Will they insist that the person who has taken the test says how many times they took it? Then at lest you can have fun with the inferential problem of working out their average score.

  2. I don't follow the bias…_everyone_ can take the tests as many times as he or she wants (or up to a maximum number), right? So if someone was not ok in the first try he or she can do it again. What's the problem with that? Thus why not consider just the maximum score instead of assuming that every time the guy took the test he or she was perfectly familiar with the test, felling well, slept in the night before, blah, blah, blah. Anyway, these tests are the results of previous training, being it explicitly for the test or similar high school exercises or whatever, right? Thus we cannot control for how much time each test taker actually "practice" for the test and it is of course silly to assume that every body has similar previous training. So what is the point of the average: let's people take it as many time as they want and see what they can do.

  3. If you consider the question being answered by the test as "can the student pass this test?", then the answer, if he's taken it multiple times and failed every time, is "no", and if he's taken it multiple times and passed it once is "yes". If as an admissions body you are trying to make fine distinctions between this student who passed with 91% and that student who passed with 92%, then you're misusing the test.

    After all, don't statisticians always wag the finger disapprovingly when we try to say that results which had a higher significance are "more significant"? They say you must only say they are "significant" if they passed the pre-agreed threshold, and "not significant" if they did not. No attempt must be made to make fine distinctions between different results.

  4. Bob: I think the answer is no; that's part of the problem.

    Antonio: The point is that, for reasons of scheduling, $, etc., some people take the test more times than others. You don't really get unlimited tries.

    Derek: Sometimes people's scores vary by more than 100 points when they take a test multiple times. If differences were really so minor, then, sure, just let everybody take it once and that'll solve the problem.

    Here's the point: if you took the test and didn't do so well, sure, retake it. But I agree that it's only fair for the admissions offices to see the multiple scores also, then they can come to their own judgment (possibly following Antonio's reasoning above). But keeping the information hidden doesn't seem like a good idea to me, either from the perspective of inference or of fairness.

  5. Taking the average isn't exactly fair either, right? Say I get a 400/400 my first time taking the test because I was nervous, or hadn't practiced enough, or hadn't seen the math in a long time. 6 months later, I take the test again, and get 700/700.
    550/550 doesn't seem like a fair compromise here. I shouldn't have to take the test several more times to bump my average up. Why not just take the most recent score?
    Maybe we should just stop pretending that the GRE actually measures something? Not that I have a better solution.

  6. ok, but for other reasons, people perform better in the first try than others, holding abilities and skills constant….I don't see the advantage in trade one set of reasons for the other…

  7. To Antonio – there's a particular bias against international students (outside the US, tests are more expensive and offered less frequently) and students from underprivileged backgrounds if you fail to account for the multiple tests. When I worked as an admissions officer at a top-tier business school, we made sure to take into consideration the number of times an applicant took the test (in this case the GMAT). There's something to be said about someone who would be willing to spend time, effort, and money to achieve marginal (if any) gains on a standardized test.

    On top of that, can we say that GRE scores predict success on the academic job market? probably not. In fact, standardized test scores can typically only predict achievement in the first year or two of instruction.

    So I'd agree that scores won't be perfect and should be considered among a series of other markers of achievement and potential.

  8. Taking the average of all tests is not statistically meaningful because students' knowledge can be considered as a non-stationary process.
    For a student, the knowledge of subject X changes tremendously, especially for topics like English language for international students (where they start studying hard for TOEFL about a year or two before applying).

    I am not suggesting that the previous results should be blind to the admission committee; instead, I suggest that the admission committee need to put much more weight on the recent history of scores.
    Of course, the problem is that how to implement this idea in a consistent manner. The usual minimum requirements are well-defined – just compare a scalar with another. Doing so for the history of scores is not obvious.

  9. For the GRE, a perfect score on the quantitative portion puts you at the 94th percentile, and that population includes people applying to grad school in non-quantitative areas. In my experience, GREs are not very helpful beyond identifying people who clearly are not up to the challenge.

    The math subject GRE is a different story. I find that people who score in the top few percent on that test are typically very strong, but this is almost always confirmed by transcripts and letters.

    With the TOEFL, I think averaging would be a mistake. If an applicant takes it one year, scores poorly, and then studies hard for the next year, why should the bar be higher that time around? In any case, TOEFL tends to be a threshold test. Nobody will be admitted to a program based on their stellar English, but they may be denied based on low scores. Here, I feel the most recent scores are most relevant by far.

  10. "can we say that GRE scores predict success on the academic job market? probably not. In fact, standardized test scores can typically only predict achievement in the first year or two of instruction."

    so what's the point?

    If we are dealing with a measure that's got reliability but low validity, does it really matter which way we do the scores (other than to the revenue of the testing firms)?

    For some reason, this reminds me of an old college friend who graduated with a 2.4 GPA (out of 4) from a low-grade commuter school … skipping ahead a few years he's now got a chaired professorship.

  11. I think the question is "what exactly is it that you think the GRE is measuring?"

    Perhaps the student didn't study sufficiently the first time, was having a bad day, or whatever else. I remember taking the GRE Math Subject Test cold and being completely unprepared for it, not because I didn't know anything on it but because it was asking me questions from Calculus III that I hadn't seen in 4 years and I couldn't figure them out in under 3 minutes.

    So the student does poorly for whatever reason. They then go back, study, and retake the test. They score better, perhaps significantly better. They have now demonstrated knowledge at that score level, and thus that is the only score that matters.

    It's like my objection to averaging "retake" exams. The purpose of an exam is to demonstrate knowledge at a particular level: If you take an exam three times and score a 50, a 70, and a 90, I would say only the 90 should count because you have demonstrated–after some work, but demonstrated nonetheless–a 90% understanding of the material being tested. So it took you a little more time, a little more study, or a few failures to get the material. At the end of the day, you still grasped the material, and that's what matters, right?

    This is certainly how many professional certification exams work. For the actuarial exams you are also encouraged to sit multiple times, and while all scores are reported no one cares beyond that you hit whatever the magic "pass" mark was, once. The same is true of the PRM: You either pass it, or you don't.

  12. Here's a suggestion to please statisticians: why not report the whole series (only a few numbers anyway) and let the admissions officer interpret all the scores? He/she can use the max or the average or the min or a self-named formula and then let us know about it in a psychometrics journal.

    btw, Andrew is right again about psychometricians. I'm sure they already have the answer.

  13. Andrew,

    I agree: from the perspective of fairness, ETA ought to give all the information about test takers.

    But is taking an average the best solution? Re-taking a test introduces selection bias. But allowing the applicant to report only the highest score certainly makes matters even worse.

    Mike

  14. "Re-taking a test introduces selection bias" sorry but I don't see why this "bias" is worse the the other mentioned problems, such as assume that each test taker know equally well the test, is in felling ok at the exam time and blah, blah blah.

    I don't think the test is useless at all – thought probable not very useful either. However, it seems people like to make stellar assumptions in order to believe the test is measuring something like IQ or some type of intelligence or whatever. Let's assume it is what it is: a test about whether you are good at very basic – and not useful – math, writing compositions quickly about not very substantive topics and…well the verbal part I really don't know…

Comments are closed.