Should statistics have a Nobel prize?

Xiao-Li says yes:

The most compelling reason for having highly visible awards in any field is to enhance its ability to attract future talent. Virtually all the media and public attention our profession received in recent years has been on the utility of statistics in all walks of life. We are extremely happy for and proud of this recognition—it is long overdue. However, the media and public have given much more attention to the Fields Medal than to the COPSS Award, even though the former has hardly been about direct or even indirect impact on everyday life. Why this difference? . . . these awards arouse media and public interest by featuring how ingenious the awardees are and how difficult the problems they solved, much like how conquering Everest bestows admiration not because the admirers care or even know much about Everest itself but because it represents the ultimate physical feat. In this sense, the biggest winner of the Fields Medal is mathematics itself: enticing the brightest talent to seek the ultimate intellectual challenges. . . .

XL invites speculation on “NP-hard problems and (hence) NP-worthy figures in statistics.” My first reaction was No: I like honoring great work but I don’t like the personalization of the Nobel or the competitive aspect of it. Also, I’m guessing that such a prize would create more unhappiness than happiness (see discussion here and here).

Xian agrees: “I do not think we should start on a similar prize by raising money for that purpose. Better support young researchers and international projects. In addition, I fear it would appear as a negative compared with the existing math prizes.”

So I am not convinced by XL’s arguments. On the other hand, I accept that he is much wiser than I am regarding the “real world.” So I’m guessing that he is probably right.

If we do have it, let’s make it about the work, not about the people

If XL is behind something, it’s probably a good idea and it’s probably going to happen. Setting aside the nuts and bolts of actually creating such an award, if we have speculation, I’d be much less interested in speculating who would or should get the prize (indeed, I am turned off by much of the hero-worship surrounding the economics and math prizes), and much more interested in speculating on what work would receive or deserve the prize.

XL talks about Nobel-worthy open problems. That’s fine but I think it might make sense to start by taking a cue from the existing Nobel prizes in chemistry, physics, and biology, and considering what work that has already been done, by people who are currently alive, that might deserve the prize. That is, if this prize were to be given out annually starting tomorrow, what research would or should get it?

Here are some important contributions that come to mind (in no particular order): bootstrap, lasso, generalized estimating equations (I actually hate that stuff but it’s undeniably had lots of inference), false discovery rates (ditto), multiple imputation, various methods in imaging.

Then there’s the big one (to me): hierarchical Bayes. It’s hard for me to single out a particular contribution here, but I’m thinking of the cluster of papers from about 1970 to 1980 where a bunch of researchers demonstrated that multilevel models can work in a general way in many different application areas.

What about computing? Arguably, separate awards could be given to three different packages: Stata, Sas, Bugs, and R. Sas is sort of horrible but it’s doubtless enabled lots and lots of statistics. And Bugs is getting replaced by Stan and various alternatives but it was an important research contribution and indeed is still in use. One might also add Glim but it’s associated with Nelder who is no longer alive so we can’t include it in this list.

What else? Various theoretical ideas, I suppose, although it’s hard for me to weigh the importance of these, as compared with methods and applications. For example, posterior predictive checks are a big deal and getting bigger, but the none of the theoretical papers on the topic (including those of XL and myself) really do the idea justice. Another example would be the work of Berger and others on Lindley’s paradox: I think this work would have to be included because it is so influential and has affected practice as well, even though it did not directly lead to any model or method.

Then there’s probability theory. I’ll let others judge what is the most important work in that area.

What about graphics? The exploratory data analysis revolution is huge, and indeed it is continuing to spread, both on its own and within the world of model-based inference. It’s tough to pinpoint any particular work in this area as representing a key contribution. Even Tukey’s classic work published in the 1970s is more of an inspiration than a contribution—I mean, really: Stem-and-leaf plots? Rootograms? The January temperature in Yuma Arizona?? This work changed the face of statistics but I’d feel uncomfortable giving an award for a set of methods that nobody ever actually uses, or should use. Anyway, it wouldn’t count since Tukey is no longer around, but you get the point.

What about statistical computing? I may stand too close to this subject to be a good judge, but I’m thinking that data augmentation, Hamiltonian Monte Carlo, variational inference, and expectation propagation are four great ideas that have made a difference. We could keep going down the list to include things like slice sampling, simulated tempering, bridge sampling, etc etc. One difficulty is that I don’t quite know where to stop. Should preference be given to ideas such as slice sampling that seem to stand alone as unique contributions? Or does it make more sense to give awards to general areas such as tempering and multigrid methods that have been hammered out by dozens of computational physicists working on lots of problems? Does Gibbs sampling get an award? From the standpoint of statistical computing it’s really nothing special compared to stuff that physicists were doing in the 1940s. On the other hand, the method’s very simplicity has contributed to its wide use among statisticians.

There also must be some important statistical computing that has nothing to do with Bayes. I don’t know if hadoop or whatever deserves a Nobel prize, but you get the idea.

Hmm, what else? The central application areas of statistics are survey sampling and causal inference. Both have seen lots of progress in the past few decades. Again it can be difficult to pick out specific contributions, partly because the theory and applications are often in separate places. It’s not difficult to find examples of theoretical work (for example, by Rubin, Imbens, and Pearl) making fundamental contributions to causal inference, but I think applied work in the area deserves a prize too, perhaps the work of Greenland and Robins. It could be difficult to pick out a single applied paper but the body of work is important. Just as, in biology, the development of a useful lab technique can have important scientific implications, similarly, in statistics, a series of successful applications can lead the way to important methodological developments.

Survey sampling, that’s more difficult to isolate the contributions. The big thing here is Mister P (or so I think), and, again, progress has been slow, starting with work on small-area estimation in the 1970s and then continuing through the 1990s and today with a gradual integration of model-based and design-based approaches. Perhaps the weighting-based and poststratification-based approaches deserve a single shared prize. Experimental design is another hugely important topic in statistics but I don’t know if there have been any really important contributions in this area by researchers who are still alive. I think if experimental design as an old, classical topic (and a topic that’s important enough that it continues to be rediscovered by outsiders), but that might just be my own ignorance.

To go in an entirely different direction: I’m not sure how to consider work that’s tied to specific application areas. If Jun Liu or whoever develops some brilliant method to solve a problem in protein folding, and this method does not generalize at all, is it still prize-worthy? I’d say yes. As the saying goes, statistics is applied statistics. The only trouble with giving prizes within application areas is: (a) the two biggest application areas of statistics are, I’d guess, biology and economics, and these fields already have their own Nobel prizes; and (b) it can be difficult to judge the importance of applied work in an unfamiliar area. I speak as someone whose colleagues (in the early 1990s) ignorantly and rudely dismissed my work in social science. Things have changed, and social science is now a hot area in statistics (“big data” and all that) but the general point remains. I also don’t know how to rank the importance of specific methods. For example, proportional hazard models have had a huge impact in biostatistics. Is that enough to be worth an award all on its own.

Finally, I suspect there are some big ideas I’ve missed, either because I forgot them or just because there are big areas of statistics that I don’t know much about. My main point here is to take the topic that Xiao-Li launched, and to steer it away from discussion of personalities and toward a discussion of ideas and research contributions. Less “who,” more “what.”

Feel free to give your suggestions of Nobel-worthy statistical ideas in the comments.

P.S. Some thoughts from Christian Robert here.

23 thoughts on “Should statistics have a Nobel prize?

  1. There could be a “prize for nobility in statistics” for anyone who refuses to publish work which has all the superficial gloss needed for publication but which they know in their heart is a waste of pixels and paper.

  2. As a non-professional stat guy I would look at the big bang things that have affected millions and what their underlying ideas are. They may be trivial, in which case I think there should be some measure of insight needed.
    Anyway

    What kind kind of modeling is underneath the Spam Filters? It is called Baysian, but specifically are there new techniques for text filtering used or that came out of this? This is a huge win for everybody that does email.

    As a related topic, what are the statistical techniques used to allow the deciphering of DNA sequences with a known degree of confidence? And used to match things together? This has obviously revolutionized our knowledge of our history and the history of life.

    Imaging and clustering techniques have boomed, perhaps mainly because of more powerful computers, but were there some special advances there? Everybody has access to algorithms and visualization that were PHD stuff 20 years ago, and I feel like there must have been more than speedy hardware. If it is new speedy hardware than I guess the CPU makers should get a prize…

    Mark K.

    • Ironically, Bayesian spam filtering employs naive Bayes, which isn’t even a Bayesian technique. I wrote a blog post on the topic: Bayesian naive Bayes. There wasn’t any new natural language processing in the early language-based spam filters. Contemporary spam filters use all sorts of features other than language ranging from honeypots (fake accounts, mail to which is a 100% specificity test for spam) to link analysis (spam tends to have links of a certain kind). But I agree — spam filtering is a great development for all of us in the face of no-goodnik spammers, and a really good example of what you have to do to field a real, practical system for a gazillion users.

      In my opinion, the first and most profound advance in statistical language processing was Shannon’s introduction of n-gram language models way back in his seminal 1948 paper on information theory (info theory is an idea that’s worth prizing, although Shannon’s no longer with us). Mosteller and Wallace’s 1964 analysis of the Federal Papers was also decades ahead of its time (Bayesian analysis of overdispersion done with very primitive computing tools like index cards and slide rules).

      How do people feel about Google’s PageRank algorithm, aka the billion dollar eigenvector? Certainly social network techniques were around with the same modeling, but Google figured out how to apply it to the web and how to really scale it. Of course, now Google search ranking is a lot more than PageRank — there’s even lots of nifty natural language processing.

      Ditto for Google translate — the ideas were developed in the 1950s, finally put into practical application at IBM in the late 80s and early 90s, then refined over decades to get where we are today.

      The shotgun sequencing techniques used in the human genome project are more heuristic than statistical. And that goes for today’s alignment algorithms for RNA measurement. There’s been some neat work in expression levels for micro-arrays (I’m thinking of hierarchical modeling tools like dChip), but most of the alignment algorithms I’ve seen for today’s short-read sequencers also seem more heuristic than statistical; they could be made statistical, as I argued in a blog post: Sequence Alignment with Conditional Random Fields, and I believe the field’s been heading in that direction (I don’t see much of it any more).

  3. I liked his 7 word challenge recounted here http://bulletin.imstat.org/2013/10/the-xl-files-ig-nobel-and-247/

    Mine would be “Intensifying the process of being less wrong.”
    (a clear summary [of statistics] that anyone can understand, in seven words.)

    But I am not convinced by XL’s arguments that such a prize would be consistent with that.

    In math, is it not the case that its much clearer what are hard unsolved problems and what constitutes a solution (and being simply less wrong there does not count at all).

  4. It’s surprisingly hard to get a new prize off the ground to be seen as a complement to the Nobel. The Field Prize is well-known despite not much money, the Macarthur grants are famous (due to the word “genius” having become attached to them), and the American economics prize is fairly well known although I can’t think of its name. The pseudo-Nobel Econ prize is of course the most successful start-up.

    Yet, the Crafoord Prizes were specifically designed to be Nobel Prizes for non-Nobel fields like astronomy and biology (e.g., the King of Sweden hands the prize over to the winner), but I seldom hear about them except in the memoirs of winners, even though they are now over 30 years old and the annual prize money is not trivial ($600k).

    http://en.wikipedia.org/wiki/Crafoord_Prize

    I’d suggest a close study of what has worked and what hasn’t worked with other prizes before launching a new one.

      • The biology winners constitute a a who’s who of big names in the English-speaking world. Here are some big names who have won the Crafoord Prize

        Paul R. Ehrlich
        Edward O. Wilson
        W. D. Hamilton
        Robert May
        Ernst Mayr
        John Maynard Smith
        George C. Williams
        Robert Trivers

        Wilson’s autobiography features a photo of the King of Sweden bestowing the Crafoord Prize upon him. But, despite doing a lot of things that seem smart from a PR standpoint (e.g., reward English-speaking Darwinists who have been big names in America and Britain for a long time), Anglo-Americans have barely heard of it in its 1/3rd of a century of existence. A cautionary tale …

  5. There actually is no Nobel Prize in Biology, because Nobel didn’t endow one, and apparently he was very specific about what could/couldn’t be done with his money. Some important biological discoveries got prizes in Medicine (e.g. genetics) or Chemistry (protein/DNA structure), but they are kind of getting shoehorned in. So for that reason there will be no Stats prize.

    Although, ignoring practicalities, I think it’s a good idea.

    I do think there should be more IgNobel prizes for Statistics, certainly there are some deserving candidates.

  6. Don’t some of them (Hotelling) get econ prizes?

    I would actually agitate for a yearly math prize. Stats could then be part of that.

  7. > Feel free to give your suggestions of Nobel-worthy statistical ideas in the comments.

    Robust statistics. There are many contributors to the field. Peter Huber obviously comes to mind. Also Dave Donoho. Good work associated with the group at KU Leuven too.

  8. In causal inference, I agree with Andrew that “applied work in the area deserves a prize too.” Besides Greenland and Robins, I think at least two groups deserve special mention: Orley Ashenfelter, David Card, Alan Krueger, Joshua Angrist, David Lee, and others in economics; and Don Campbell’s group (Tom Cook, Will Shadish, Chip Reichardt, William Trochim and others) in psychology.

    I also agree with Andrew that it should be “about the work, not about the people”. The work I have in mind is the movement toward randomized experiments, natural experiments, instrumental variables, and regression discontinuity designs.

  9. mark zuckerberg and yuri milner recently announced a $3 MILLION prize in mathematics.
    of interest, they listed as possible topics: genetic engineering, quantum computing, or artificial intelligence. so, they seem to think of math writ large, including disciplines that academics might call computer science, statistics, or machine learning. the actual quote of the announcement is appended below:

    “Meanwhile co-founder Yuri Milner — the Russian venture capitalist who has invested in many tech companies — described the need for the new prize: “Einstein said, ‘Pure mathematics is the poetry of logical ideas.’ It is in this spirit that Mark [Zuckerberg] and myself are announcing a new Breakthrough Prize in Mathematics. The work that the prize recognizes could be the foundation for genetic engineering, quantum computing or artificial intelligence; but above all, for human knowledge itself.”

    link to whole story (not sure this blog allows links): http://www.mnn.com/green-tech/research-innovations/stories/mark-zuckerberg-and-other-tech-billionaires-create-3-million#

  10. Just got sent back here from a recent post on awards and saw this: “but I think applied work in the area deserves a prize too, perhaps the work of Greenland and Robins.”

    Um, Robins is mainly a theoretician whose work is motivated by and influences applied problems. Here’s a good overview of his work: https://arxiv.org/pdf/1503.02894.pdf. He developed the foundational theory of causal inference for time-varying treatments (which is a huge category). Also, he studied the theory of posterior predictive checks, which you might be interested in: https://www.jstor.org/stable/2669750?origin=crossref&seq=1#page_scan_tab_contents

    • Z:

      When I say something’s applied, that’s a compliment! I consider theoretical work to be applied if it has a clear enough motivation and connection to applications.

      • I definitely don’t generally take “applied” negatively, but in this context you distinguish it from “theoretical work (for example, by Rubin, Imbens, and Pearl) making fundamental contributions to causal inference”. I’m just saying Robins made fundamental theoretical contributions to causal inference at the same level those guys did. And it’s a little inaccurate (though not an insult, except maybe to Rubin and Imbens who also did a lot of applied work) to imply that he had more of an applied bent.

  11. An interesting proposal. The award should be about work. As the most contribution to statistics theory, I would nominate Raymond Hubbard. I’m wowed by the breadth and depth of his analyses. I believe that another book would do it for him, provides more in-depth citations and examples. Otherwise, Andrew Gelman and Sander Greenland deserve even more recognition.

  12. Pingback: Stan receives its second Nobel prize. « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.