Test scores and grades predict job performance (but maybe not at Google)

Eric Loken writes:

If you’re used to Google upending conventional wisdom, then yesterday’s interview with Laszlo Bock in the New York Times did not disappoint. Google has determined that test scores and transcripts are useless because they don’t predict performance among its employees. . . .

I [Loken] am going to assume they’re well aware of the limits of their claim, and instead I’m going say that as readers of the interview we should not lose sight of a fundamental fact –

Across a wide variety of employment settings, one of the most robust findings in organizational psychology is that tests of cognitive ability are strong predictors of job performance. If Google has found otherwise, what they have found is that grades and test scores are not predictive of performance at Google. In general, in the workplace tests are still highly predictive of success.

If all the research says that test scores and grades predict performance, why would the people at Google want to ignore this information?

Loken continues:

There are at least two factors in play here (and again I’m assuming the folks at Google are well aware of both of these points). First, when a company has built its brand on attracting only the brightest prospective employees, through self-selection and through sheer volume of applicants the pool will be extremely competitive. Google likely doesn’t have much variability among those hired with respect to test scores and grades. And when there is no variability, there is no correlation with anything. It’s a similar argument to MIT saying that the SAT is useless for their admissions. Their applicant pool is so vast and highly qualified that the incoming class is largely homogeneous on those measures.

The second point is a bit more subtle. We of course need to wonder how all those people with lower test scores and grades would have fared at Google had they been hired. But furthermore, when an organization has used a certain instrument to select their employees, the correlation with job performance goes down. It’s not just that the range has been reduced (through selection). It’s also that the information has already been acted upon. If someone is hired despite their lower test scores, it usually means some compelling compensating characteristics made that person look like a good bet. That is why the correlation between a valid selection instrument and job performance can be dramatically depressed when only looking at the hired sample.

Loken concludes:

It’s common to misinterpret that low correlation as a sign of poor prediction. Again – thousands of research studies have confirmed the predictive validity of tests of cognitive ability for job performance. Google may well say, “Not here.” – but they cannot (and did not) say, “Not anywhere.”

21 thoughts on “Test scores and grades predict job performance (but maybe not at Google)

  1. You have to consider how strong a test of cognitive ability grades are across schools and majors. It may be, for example, that Computer Science degrees in don’t test social or verbal intelligence (valuable in small group work), or that other classes are poor predictors of success as programmers. GPA may also be confounded by environmental stressors or mood (temporally inconsistent, especially in today’s world of ready access to antidepressants.)

    Additionally “how they performed at their job” is a similarly noisy measure, and may be undermined at Google by perverse politics.

  2. > It’s common to misinterpret that low correlation as a sign of poor prediction. Again – thousands of research studies have confirmed the predictive validity of tests of cognitive ability for job performance. Google may well say, “Not here.” – but they cannot (and did not) say, “Not anywhere.”

    Moreover, they cannot say that tests of cognitive ability would not predict job performance at Google if they went from taking those tests into account in hiring to not taking them into account in hiring–which is counterintuitive and important.

  3. There are few reasons why google should ignore these scores.

    #1) Integrity of data. There is a disproportionate amount of cheaters with “reported” perfect scores / gpa’s. Too many of the wrong ones end up at the top of the list.

    #2) Google is in the business of selling products. It is difficult to sell products with rigid employment standards… Maybe the same case can be applied to mit…

    #3) Does google actually hire skilled workers? Most appear to have joined through merger activity.

  4. “If all the research says that test scores and grades predict performance, why would the people at Google want to ignore this information?”

    Perhaps because it’s not orthogonal to the other information being gathered.

  5. Binarized “testing” of an ill-posed question strikes again. Loken is wrong to cast the question as whether tests of cognitive ability for job performance are “valid”. Cognitive tests are neither “valid” nor “invalid” predictors of job performance. As the discussion highlights, it depends on 1) the relevant applicant pool and 2) the job.

  6. Applicant pool effect: I suspect the same effect would have been seen at Bell Labs.
    I.e., would people’s success at jobs correlate with high-school grades?
    A: No.

    I was in a group of 11 BTL members of technical staff (actually a representative sample,k need technical MS or PhD), plus one AT&T guy.
    We’d visited a telephone operator site that day, that was due to be closed down, and the people were unhappy, even though the company finding jobs elsewhere. BTL folks at dinner talked about how anyone could stand to do such a boring job all day.

    AT&T guy observed that it was a job, not a career, the (almost all women) worked in a clean, safe place in their town where they could trade off schedules with each other, so it fit their lives.)

    Bell Labs folks: but it is sooo boring, we’d all go nuts.

    AT&T guy: well, how representative do you think you are?
    BTL folks: oh, we’re pretty typical (this is when BTL was ~25,00 people in R&D, and like its own country; people not only worked together, ate in company cafeterias, but there were lots of clubs, ski trips, canoe trips, etc. This group indeed was typical of Bell Labs MTS.)

    AT&T guy: well, let’s see how representative you really are. How about everybody raise their hand if they were high school valedictorian.

    BTL folks: 11 of 11 hands went up. Oh. Perhaps not a random sample of the population. :-)

  7. An MIT science professor said that the most important GRE score for his department’s admissions committee is the verbal one. They expect that all competitive candidates will have 790 or 800 (out of 800) scores on the math and analytical reasoning sections. The subject GRE was heavily weighted towards a sub-specialty where the department was not particularly active so that wasn’t much help.

    They discovered that verbal GRE correlated better than the others and conjectured that:
    1. Verbal scores tend to go up, the more one reads.
    2. Students that have time to read literature, or things not needed for science classes, must be either better time managers or just able to complete their science work faster. They’ve got something “left over” after excelling in science classwork.
    3. Those are the students they want.

    • It sounds like they emphasized the verbal portion just because their applicants were maxing out their score on the math and analytical sections, so of course you’d expect the correlation to be 0 if everyone is getting the same score…?

  8. “thousands of research studies” indicates to me the original poster is blowing smoke about something he thinks is true, and would like to be true. Somehow I managed to do a dissertation on the relation between education and income and missed those thousands of studies. Instead, what I found, consistent with what Google found, was that academic performance has little predictive value apart from highest grade completed.

    If someone wants to produce some of these thousands of studies I’d be curious to see them.

    • I don’t know if there are thousands of studies, but meta-analyses do suggest that general cognitive ability is the best predictor of job performance across a very wide range of jobs. See here.

      • Actually, the linked study says that “On the basis of instruments of this sort [referring to the Wonderlic IQ test], thousands of validity studies have accumulated since the early part of the 20th century.”

        • I think it’s worth pointing out that there are, indeed, if not thousands, at least many hundreds of studies linking “general cognitive ability”–framed in some particular operationalization, such as “ability”, “achievement”, or “aptitude” tests, and many performance-related outcomes (summarized in meta-analyses by Schmidt & Hunter, as above, also their 1998 paper in Psychological Bulletin on a variety of predictors of job performance, Hunter & Hunter, 1984, Kuncel, Hezlett, & Ones–which demonstrates considerable overlap in the predictive power of the MAT for both academic performance and job performance, and many others).
          Most of this work is published in Journal of Applied Psychology, Personnel Psychology, Psychological Bulletin, and other outlets for industrial/organizational psychologists and educational psychologists. It’s not surprising that this domain would be missed in a literature search on education and income.

          I use quotation marks above, not as scare quotes, but to reinforce the idea that these psychological “constructs”–a term I do not like, but will accept here–do not necessarily mean what they sound like. For instance, let’s focus on the “job performance” side of the bivariate relationship. This is a difficult term to precisely define, operationalize, and measure–the literature here goes back to at least Thorndike in the late ’30s, and was most controversial among industrial psychologists in the late ’50s and ’60s.

          For convenience, it is often operationally defined by getting ratings from an employees direct supervisors. This can be useful, but is subject to many cognitive and affective biases. Further, in cases where there is some more “objective” measure of performance–widgets produced, items repaired, billable hours, sales figures, etc.–AND these more “subjective” ratings, the correlations with cognitive ability measures can be very, very different.

          Furthermore, cognitive ability test scores and grades do not perfectly coincide, so they shouldn’t be conflated. In general, people who score higher on cognitive tests do have higher GPAs, but GPA is itself a kind of performance metric, and will be affected by the individuals motivation (i.e., “brilliant, but lazy”, “dumb, but hard-working”, or some other combination), other personality characteristics, and external influences (friends, jobs, family obligations, etc).

          Additionally, “cognitive ability,” while a broad construct, probably does not include ALL cognitive operations that may be important for a given performance domain–it should be relatively more important in areas where new learning, information search and synthesis, and solving novel problems are germane, less but still important in bureaucratic areas that involve use of complex but relatively fixed rules, and least important areas that are primarily physical or repetitive. It’s probably fair to characterize cognitive ability testing as indicators of those aspects of cognitive processing that are relevant to living/thriving in a modern, industrial, hierarchically structured environment.

          But, to address the original post, all I did was read the title and I immediately thought “restriction of range”–which would serve to attenuate the observed correlation. If the restriction was tight enough, which due to both self-selection and the ultra-competetive selection by Google itself, would tend to drive the observed correlation to zero.

        • Yes, restriction of range would be my guess, just like the MIT CS admissions prof listed above. Very high quant and analytic GREs are to be expected in technical majors, but there’s variance on the verbal component, and it’s useful. (Ditto I suspect for more humanities-oriented programs where folks might look at the quant score, not for a really high value but for a not-awful value. But I don’t know that.) If the test lacks variance in the applicant pool it can’t correlate with much of anything. That means that you’d want to look elsewhere.

          But Google can be fingered for their stupid trick questions. Those likely have very low validity of predicting much of anything besides willingness to research the heck out of trick questions and acquiesce to the questioner. Like a lot of interview things, they have a certain face validity but are of dubious value beyond that. It’s not dissimilar to the overused three item cognitive quiz all the JDMers use, with the cost of the baseball bat.

  9. Following the link to the original post shows it is on a commercial website that sells assessment services. Not exactly a place one would expect to find unbiased information on the value of those services.

    There have been examples of striking successful tests for predicting job performance, but they are not tied to general cognitive ability. For example, Seligman was able to identify who would be a successful insurance salesman far better than an insurance company by identifying psychological traits that led to persistence in the face of adversity. This had cognitive elements in how rejection was framed, but these were not cognitive in the SAT or grades at school sense.

    • The problem with tests of character is that they tend to be easier to game once word gets around about how to outsmart them: just figure out the answer that makes you look persistent and choose that one. In contrast, the easiest way to outsmart tests of smartness is to be smart.

      This is not to say that cognitive tests can’t be gamed. A look at what’s going on in South Korea, where the SAT was recently cancelled nationwide because of massive cheating, suggests that a diligent, intelligent, and cooperative group of people can game cognitive tests. But it takes a lot of work.

  10. Pingback: Social Phobia Treatment Goals Blog

  11. Pingback: Somewhere else, part 70 | Freakonometrics

Comments are closed.