Gladwell vs Pinker

I just happened to notice this from last year. Eric Loken writes:

Steven Pinker reviewed Malcolm Gladwell’s latest book and criticized him rather harshly for several shortcomings. Gladwell appears to have made things worse for himself in a letter to the editor of the NYT by defending a manifestly weak claim from one of his essays – the claim that NFL quarterback performance is unrelated to the order they were drafted out of college. The reason w [Loken and his colleagues] are implicated is that Pinker identified an earlier blog post of ours as one of three sources he used to challenge Gladwell (yay us!). But Gladwell either misrepresented or misunderstood our post in his response, and admonishes Pinker by saying “we should agree that our differences owe less to what can be found in the scientific literature than they do to what can be found on Google.”

Well, here’s what you can find on Google. Follow this link to request the data for NFL quarterbacks drafted between 1980 and 2006. Paste the data into a spreadsheet and make a simple graph of touchdowns thrown (as of 2008) versus order of selection in the draft to create the picture below.

image0011.png

The graph includes 373 QBs with a correlation of -.40. If you take the log of TDs the correlation increases to -.57. But correlation can be misleading here because the data are heavily skewed and stacked at zero. Instead, just focus on the perfectly transparent visual display. What is the probability that a quarterback throws 50 or more touchdowns if picked early in the draft? Is the probability lower for QBs picked later in the draft? If you were going to predict performance, would you want to know the draft position of the QB before you made your prediction? The answer to this last question is an unequivocal yes.

So how do you make this plain-as-day-association disappear? You can eliminate some of the data by declaring it off limits. For example, an economist named David Berri has recently published an article claiming that the correct way to look at the above data is by filtering some observations and making some transformations. (I am working from his blog post here as the journal article is not yet available at my library.) On his blog, Berri says he restricts the analysis to QBs who have played more than 500 downs, or for 5 years. He also looks at per-play statistics, like touchdowns per game, to counter what he considers an opportunity bias. Because early draft picks are given more opportunity to play, there is a natural correlation between draft order and playing time which might inflate the career statistics like total touchdowns.

Fair enough, but you have to be careful about writing off one source of covariance as a bias in need of correction. Longevity in the NFL is a function of opportunity and success. To attribute all the covariance between playing time and draft order as some sort of opportunity bias is to dramatically redefine the logic of the question. Does anyone believe that NFL owners and coaches are just “socially promoting” their early draft picks to run up these gaudy production stats, while equally able QBs with the misfortune of being selected later in the draft sit idly by and watch? Yes,there are Tom Bradys sitting on the bench… but very very few quarterbacks picked 199th in the draft are remotely as good as Brady proved to be, whereas several QBs picked in the early rounds are as good. You can’t look at the above graph and not agree that there is some association between draft order and probability of being a high producer. It doesn’t make sense to say that graph is an illusion due to uncorrected factors.

Even when I [Loken] do take a few chops at the above data, I can’t eliminate the strong correlation. The correlation is still there when I do TDs per game. It’s there when I restrict the data for at least 100 pass attempts. The correlation is even bigger when I do TD per game for QBs picked in the first 100 positions of the draft. I can’t get the association to go away, and I’m going to let these graphs stand as a challenge to Gladwell’s statement that no prediction is possible regarding the future success of NFL quarterbacks. The consensus of the predictive information reflected in draft order out of college unambiguously does predict future performance.

I don’t have anything to add here, except to note that Loken’s blog entry had a lot of internal links that I was too lazy to cut-and-paste over–I think there must be a way to do this automatically but I don’t know how.

Also, here’s my Q-and-A with David Berri from a few years ago. We just talk about basketball, not football.

26 thoughts on “Gladwell vs Pinker

  1. What I see in the scatterplot is potentially a small group of outliers that constitute particularly touchdown-prone low draft choices, a small group that towers distractingly from about 200 on the vertical axis, over a hodgepodge of scattered data points from 1st to 250th. If I were analyzing this, I'd be tempted to treat the 8 or so top producers as outliers, filter them out, and see how the correlation emerges then. From observation, I'd say it would be nearly flat. Loken also doesn't mention the ominous cluster of data points at the bottom left, high draft choices that bombed. From a statistical viewpoint, I'd want to see a far stronger regression on draft order, and perhaps a better dependent variable than touchdowns thrown, because, as he acknowledges, clubs are pressured to play high draft choices early, and it introduces bias. He discounts the bias, but can't quantify it. Another factor he doesn't even mention is whether the QBs were drafted for their passing or running skills.

    Lacking better analysis, I'd still be happy to go with Gladwell's conclusion that draft order is a very poor predictor of performance.

  2. Sounds like there's an easy fix to that bias, just look at the ratio, touchdowns/downs_played and account for the uncertainty when downs_played is low.

  3. Good post, especially about Berri's claim: as far as I could tell, he conditioned on being a successful quarterback, and then found that draft order could not predict being a successful quarterback. Well, yes, we already conditioned on that!

    I thought a two-stage/multilevel procedure would have helped a lot – jointly evaluate playing time and performance conditional on playing time. I haven't seen anyone do that though.

  4. I think you are dramatically underestimating the bias among coaches and GM's to give opportunities to high draft choices over low draft choices or free agents. GM's and coaches #1 goal is to keep their job. Generally, winning is a good way to go about doing just that, but not everyone can win. So, in general they try to hedge against not winning by making decisions that are non-controversial and less subject to second guessing by media and fans. It's a different flavor of the no one ever got fired for hiring IBM phenomena.

    A great example this year would be in Carolina where they have a young un-drafted free agent QB(Matt Moore) who has gone 6-2 as a starter in his limited playing time and posted an above average QB rating. What did the Panthers do in the 2010 draft ? Take a QB in the 2nd round, immediately igniting a QB controversy. I guarantee you that if the Panthers aren't winning this year, Moore will be benched in favor of the 2nd round pick.

    Take two equally gifted, mediocre QB's and if one is drafted in the 1st round and the other in the 6th round, the 1st round pick will always end up with higher career "counting" stats, not because he was better, but because he got more chances.

  5. Gladwell has had problems with science before. His claim on "influentials" in buying decisions in society was debunked by Duncan Watts.

    http://www.picnet.net/blog/2008/02/13/the-power-o

    On the other hand, he did make some important analysis, such as "one size fits all" solutions do not work in product design, marketing;

    http://www.ted.com/talks/malcolm_gladwell_on_spag

    I personally skim over his stuff looking for insight, but aware that he can come short on science, sometimes.

  6. I read Loken's blog entry and the essential problem is that he isn't actually measuring performance.

    Career TD's is obviously correlated to playing time. The greatest QB's in the game normally throw 1 TD for every 20 to 30 attempts, so to throw even 25 TD's requires 500+ attempts which is the equivalent of starting for a full season. Additionally, number of TD's thrown is not a preferred method of measuring QB quality, even within a single season, even when focusing solely on starting QB's. As a metric, the raw number of TD's thrown is heavily influenced by factor's beyond the QB's control: injuries, offensive philosophy, and the surrounding talent all have huge impacts on the raw number of TD's thrown.

    TD's per game isn't much better as a metric. In fact, it's probably worse. Given the low percentage of passes that result in TD's what are the chances of a QB who averages 5 attempts per game being a leader in TD's per game ? Practically nil. What type of QB only gets 5 passing attempts per game ? A backup QB. What type of player is typically a backup QB ? A free agent or late round pick, or a failed high round pick. Of course the failed high round pick has already failed, and in doing so has already racked up some career TD's, which underscores why Career TD's is a horrible measure of QB performance.

    Finally, look at how the NFL measures QB performance: the passer rating. It's certainly not a perfect measure, but it's been used for a long time and must surely reflect at least some expert domain knowledge. What are the components of passer rating ? Completion percentage, Yards per Attempt, TD Pct, Int Pct. It's all rate stats. Counting stats are not used.

    If Mr. Loken wants to see the correlation disappear he simply has to use meaningful measures of performance. I ran the numbers from the site he linked to and when I add a trend line to the scatter plot I get r-squares of .025 and .013 for Yds/Att and TD Pct respectively. For career TD's it's .135.

  7. Berri didn't condition on being a successful QB, he conditioned on playing time. There is a huge difference. Not every QB who gets playing time is successful(there is a long list of failed 1st round QB picks) and likewise not every successful QB gets playing time(see Kurt Warner who didn't even get an NFL job until he was 28 or Tom Brady who didn't play at all as a rookie and only got a chance when the starter was injured during his second season ).

  8. I'm sorry, I should have been clearer. I know that Berri is conditioning on playing time. However, I'm pretty sure one of the biggest outcomes to describe QB success is getting a lot of NFL playing time.

    This is exactly Loken's point: Berri is overcompensating for opportunity bias by ignoring one of the major forms of QB success – that playing for the NFL for several games is a drastic improvement over most draft picks. If the claim is that draft order predicts success in two ways: it's a good predictor of playing in the NFL, but not good at predicting performance conditional on being a NFL player, that's fine. But say it that way!

    Relatedly, citing Tom Brady and Kurt Warner as evidence FOR opportunity bias is very weird, since they both have had ridiculous amounts of playing time and great numbers. Ie, the system worked and NFL caliber QBs played while inferior ones sat. Instead, you should cite some QBs who would be successful if they got opportunities but haven't. That is the right test of whether this bias exists.

  9. Pinker was citing, among others, a blog post by me responding to Gladwell's claim that there was "no way to know who will succeed at it and who won't."

    http://isteve.blogspot.com/2009/01/can-you-predic

    I looked at all the QBs drafted from 1980-1999 and found, unsurprisingly, that there was a moderate correlation between draft choice position and career accomplishment. The predictive power glass was part full and part empty.

    I wrote:

    "In conclusion, contra Gladwell, the NFL teams can predict quarterback performance in the NFL a lot better than random chance would dictate. And yet, considering the huge amount of effort that goes into selecting the most promising college quarterbacks in the NFL draft, there is much that remains delightfully unpredictable, as Kurt Warner's career demonstrates."

    My finding wouldn't be controversial except that Malcolm Gladwell doesn't really understand statistical concepts:

    One of Malcolm's biggest problems is that he has very little sense of where he is on a bell curve. He looks at people on the 99.999th percentile (top 50 draftees) and says that nobody can predict who will make it to the 99.9999th percentile, and, therefore, we should throw out prediction methods.

    Well, swell, but that doesn't mean that you can't predict ahead of time with some degree of accuracy who will wind up at roughly the 10th, 50th, and 90th percentiles out of the general population. But, Malcolm just doesn't get it.

  10. And here's my response to Gladwell's letter in the New York Times denouncing Pinker for citing me:

    http://isteve.blogspot.com/2009/11/gladwell-strik

    Here are the correlations for draft position and career accomplishments for the 278 college quarterbacks drafted 1980-1999:

    Draft and Pro Bowls: r = -0.33
    Draft and Touchdown Passes: r = -0.45
    Draft and Passing Yards: r = -0.48
    Draft and Years Starting: r = -0.48
    Draft and Games Played: r = -0.52

    The Pro Bowl metric (all star selection) is the most favorable to Gladwell and Berri, and 0.33 is still a long way from zero correlation.

  11. And here's David Berri's post supporting Gladwell, with my answers to Berri in the comments:

    http://dberri.wordpress.com/2009/11/19/steven-pin

    Essentially, Berri is comparing virtually all the top draft choice quarterbacks, almost all of whom get enough playing time to make his cutoff, to just the lower draft choice quarterbacks who turned out to be better than expected. The low draft choice quarterbacks who turned out to be no better than expected aren't included in his analysis.

  12. The more I read of Gladwell, and about Gladwell, the less impressed I am.

    His book "Outliers" boils down to this:
    To be really REALLY successful at your chosen career, you need to be a) Very talented b) Hard working and c) Lucky.

    Wow!

    His book is a procrustean bed into which all must fit.

  13. In my opinion, the same mistakes continue to be perpetuated: equating success and playing time and believing that the incentive to win somehow makes teams' playing time decisions perfectly efficient.

    How can I name QB's who are successful NFL QB's if they never got the chance to actually play in the NFL ? The point of naming Brady and Warner is that these are guys who were not high draft picks, and who did not immediately get playing time. If NFL teams weren't in the business of "socially promoting" their high picks why did it take an injury to the starter to finally give Brady a chance to play ? A team that acted rationally and efficiently to maximize wins would have identified that Brady, despite his status as a low draft pick, was in fact much better than Drew Bledsoe. That team would have given Brady the starting QB job to begin the season, because who let's an Pro Bowl caliber QB languish on the bench behind a mediocre starter ? Someone trying to keep their job by not making controversial decisions, that's who.

    If NFL teams act so rationally, why did it take Kurt Warner until age 28 to finally get a job in the NFL ? Warner and Brady are both arguably Hall of Fame QB's, both stepped in and won the Super Bowl in their first year as starters, and yet Brady only got the chance because of an injury to the mediocre starter playing in front of him, and it took Warner 5 years to even make an NFL roster. The point is that since teams had little invested in those players, they were reluctant to give them a chance. How many other Brady's and Warner's have there been over the years that didn't get a chance because the mediocre starter in front of them didn't get hurt ? Or who played for a coach who feared that sticking his neck out and starting a free agent or 7th round QB over a 1st round pick would cost him his job if the unsung QB failed ?

    You are correct that there obviously is a correlation between playing time and QB quality. There is a reason why it's Hall of Fame QB's that tend to have thrown the most career TD's, the Hall of Famers played longer. But it seems to me you have the causation backwards, Hall of Famers tend to have the longest careers because they are the best players, they aren't the best players just because they played a long-time. The all-time career TD list contains two such mediocre QB's whose durability apparently had some value to NFL teams: Dave Kreig and Vinny Testsverde.

    If you run the numbers on rate stats, the type used by the NFL to determine the best QB in each particular season, then round drafted contains essentially no information on who is the best QB. And if we are talking about QB success, then who is the best QB seems like a relevant question. You cannot ignore the playing time bias that favors high round draft picks, and naively assume that NFL teams act rationally. They are run by human beings, and are primarily motivated to keep their jobs. As I've said before there is a lot of overlap between winning and keeping their jobs, but PR and keeping the fan base happy is also a big part of keeping your job as an NFL coach. Making "controversial" personnel decisions puts a target on a coach's back and few people are willing to take that risk. The examples of Brady and Warner underscore this point.

  14. 'What Berri is doing, in effect, by using his “per-play” measure is comparing quarterbacks taken at the top of the draft (most of whom get a lot of plays in the NFL) to those taken lower in the draft who turned out to be surprisingly better than expected, and thus get a lot of plays. He’s essentially leaving out of his analysis all those lower drafted quarterbacks who turned out to be as mediocre as expected and thus didn’t get many plays. In other words, his methodology is pre-rigged to produce the conclusion that Malcolm likes.'

    Steve, do you not see the assumption you are making, that all QB's who don't get playing time didn't deserve it ? Where is the evidence for that assumption ? In fact there is plenty of evidence against it (Brady, Warner, Romo etc.).
    Do you expect me to believe in 2001 that Brady only turned into a Pro Bowl, Hall of Fame, Super Bowl winning QB AFTER Bledoe's injury ? Or is there a possibility that he was the same great QB prior to the injury, but didn't get a chance because his coaches were so risk averse they preferred the mediocre, former 1st round pick who led them to a 5-11 record the year before ?

    By the way, removing QB's who didn't play isn't even necessary. Using every drafted QB, regardless of playing time, there is no correlation between round drafted and TD Pct or Yds per Passing Attempt.

  15. "However, I'm pretty sure one of the biggest outcomes to describe QB success is getting a lot of NFL playing time."

    If that were true, there would be very little turnover among starting QB's in the NFL except for injury and retirement. The fact is that every team has a starting QB. Success is dependent having a good starting QB. The correlation between QB starts and wins is zero by definition, I am quite certain the correlation between Passer Rating, or TD Pct, or Yds per Attempt and team wins is significant. Playing time is not really indicative of QB success or quality as playing without a QB is really not an option for an NFL team.

  16. "Instead, you should cite some QBs who would be successful if they got opportunities but haven't. That is the right test of whether this bias exists."

    So, you are asking me to prove my point by refuting it ? If I believe QB success cannot be predicted by round drafted, which is based on previous play in college and the opinion of the leading experts in the field(NFL scouts), how on earth am I supposed to be able to name guys who would have succeeded but didn't get a chance, and furthermore even if I could, why would you believe my assertion since it is not verifiable in any way ?

    The test for the bias is exactly what Berri did , which I was able to confirm in a few minutes with Excel, and which you could confirm too, if you so desired. Look at actual measures of the quality of QB play(passer rating, TD Pct, Yds per Attempt, not the amount of QB play, and you will see that low round picks play as well as high round picks. Given that, the fact that high round picks actually play a lot more is proof the bias exists.

  17. Dear tbwhite:

    Do you understand the differences between saying that the correlation between draft spot and success in the NFL is

    A) zero, B) greater than zero but less than one, or C) one?

    Malcolm Gladwell doesn't.

  18. As for how teams figure out that some 7th round draft picks aren't wildly better than expected, well, that's what they have summer training camps and daily practice for. Players compete against NFL players in practice. It's not perfect, but, then, that's why we have the concept of correlation.

  19. Dear Steve:

    Yes, I think I do understand the concept of correlation. For example, the team that runs the most 2nd half kneel down plays has won 100% of their games. I believe that would mean a correlation of 1 between most 2nd half kneel downs and wins.

    Do you understand why that correlation of 1 is meaningless ?

  20. Let's look at some real performance numbers, these are from the pro football reference site linked to above:

    Round Career Attempts Yds Att TD Pct
    1 136,382 6.879 4.0
    2 44,861 6.877 4.2
    3 28,212 6.768 3.7
    4 27,391 6.645 3.7
    5 10,984 6.757 3.7
    6 32,859 6.716 3.9
    7 8,235 6.403 3.4

    Do you still want to tell me that 1st round picks throw more career TD's because they actually play better, and not simply because they play more ? On a per season basis these differences are trivial(about 250 yards passing and 3 TD's per season difference between a 1st rounder and a 7th rounder), and certainly don't justify 1st rounders getting 17 times more playing time than 7th rounders.

  21. The question is what to do with the many players who don't get the opportunity to play for a statistically meaningful time (or at all for that matter), and to answer it I suppose we'd need measures during practice games/drills. The play time bias obviously inflates count stats. But the same argument can be made for rates: A first round disappointment may play the whole season, driving down TD pct for the group. Meanwhile, round 7 might be 90% legitimately bad players, 5% good players not getting a shot and 5% good players playing (and who'll be sat if they stop over-performing). That 5% is competitive with the top players, but the 95% could decimate those rates if they all played.

    Either you build in an assumption that the players who make it are reflective of the whole round and see no correlation, assume the players who don't make it, don't make it because they're bad and see a strong correlation, or you split the difference somewhere and get a weaker correlation. Personally, I suspect that lower draft picks who never start as QB are legitimately worse players, but the data just isn't there to answer the question definitively.

  22. This graph shows a classic "dust bunny" distribution (as depicted on the cover of McCune & Grace. 2002. "Analysis of Ecological Communities") and is just crying out for a multivariate analysis.

    Since there is more to QB success than just order of draft – things like won/loss records, coaching quality, market, quality of receivers and offensive line, etc, a multivariate analysis would be appropriate to tease out the influence of draft order.

  23. Round Career Attempts Yds Att TD Pct
    1 136,382 6.879 4.0
    2 44,861 6.877 4.2
    3 28,212 6.768 3.7
    4 27,391 6.645 3.7
    5 10,984 6.757 3.7
    6 32,859 6.716 3.9
    7 8,235 6.403 3.4

    I'm just curious how a table like this is constructed. Are the same observation units (individual QBs) contributing data year after year? Then the relative contribution of individual QBs, as well as the reference group of eligible QBs, change dramatically as you move down the table. Consider the N of QBs chosen in round X, and the N that actually contributes data to that row of the table. By the time you get to round 7, what fraction of 7th round QBs produced the 8,235 attempts? Those statistics represent the performance of the best of the 7th rounders. One can't assume that the non-performers were equally talented because they came from the same draft order.

    I understand that the total production numbers are positively biased by opportunity and that is of some concern. But the table above achieves an enormous flattening effect by changing the selection rules row by row. In university admissions, you couldn't justify letting in low credential applicants by only reporting the GPA of the 20% who eventually graduated. Somewhere you have to account for the 80% who dropped out. That's what the original post was trying to say.

Comments are closed.