Standardized tests are useful, say researchers

In their article, “High-Stakes Testing in Higher Education and Employment Appraising the Evidence for Validity and Fairness,” Paul Sackett, Matthew Borneman, and Brian Connelly write:

As young adults complete high school in the United States, they typically pursue one of three options: continue their education, enter the civilian work force, or join the military. In all three settings, there is a long history of using standardized tests of developed cognitive
abilities for selection decisions. In these domains, the tests themselves often are very similar. For example, Frey and Detterman (2004) reported a correlation of .82 between scores on the SAT, widely used for college admissions, and a composite score on the Armed Services Vocational Aptitude Battery.

The question is: are these tests any good? The authors say yes:

The authors review criticisms commonly leveled against cognitively loaded tests used for employment and higher education admissions decisions, with a focus on large-scale databases and meta-analytic evidence. They conclude that (a) tests of developed abilities are generally valid for their intended uses in predicting a wide variety of aspects of short-term and long-term academic and job performance, (b) validity is not an artifact of socioeconomic status, (c) coaching is not a major determinant of test performance, (d) tests do not generally exhibit bias by underpredicting the performance of minority group members, and (e) test-taking motivational mechanisms are not major determinants of test performance in these high-stakes settings.

Their key methodological point:

The notion of preponderance of evidence is central to our position. Again, we have focused on large-scale studies, on national samples, and on meta-analytic syntheses of the literature in drawing our conclusions. Given the extensive research in this area, it is certainly the case that individual studies can be found with contrary findings. We suggest, though, that readers should be wary of attempts to draw broad conclusions from small studies and that the broader literature should be used as the basis for conclusions.

And their conclusion:

Although research has answered the question “Will high scorers generally perform
better than low scorers?” with an affirmative, the question “Should high scorers always be preferred to low scorers?” is one to be answered by test users rather than test creators. It is important to differentiate between technical questions, such as those about how well and under what conditions tests predict various criteria, and values-based questions, such as those about the relative importance of one criterion versus another. Whether an organization chooses to emphasize, say, maximizing mean task performance rather than minimizing turnover as the goal of its selection system is a matter of values.

Although our overall assessment is positive, we emphasize that we are not offering a blanket endorsement of any and all tests. There are certainly bad tests . . . We do, though, concur [that] “Educational and psychological testing and assessment are among the most important contributions of behavioral science to our society. . . . There is extensive evidence documenting the effectiveness of well-constructed tests for uses supported by validity evidence”

I found out about this article from this note by Edgar Kausel in the often-interesting Judgment and Decision Making listserv.

My own perspective on this

What are my own thoughts here? I think my reaction to this is overdetermined, in that I’m predisposed to support these conclusions as a statistician, as someone who’s personally benefited from standardized tests, and as a colleague of several statisticians who’ve worked at the Educational Testing Service. I’d have preferred if Sackett et al. had put some graphs in their paper; other than that, I don’t have much to add on the topic. I’d like it if we had some useful standardized tests of statistical knowledge and understanding.

One Comment

  1. ZBicyclist says:

    If you are a very bright kid working your way through a weak commuter college, you are going to get good grades, but nobody will much care and your recommendations will be from faculty who are also at the weak commuter college.

    The best hope is to go well on the GRE or some similar instrument, which then validates the other pieces of the package and gives you the best chance of going to a first class graduate program — and where you get your graduate degree clearly matters in lots of ways.

    That's my story, anyway.