Skip to content

Bin Yu and Karl Kumbier: “Artificial Intelligence and Statistics”

Yu and Kumbier write:

Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during generation of data, development of algo- rithms, and evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS).

All 4 of the ideas in PQRS come from classical statistics: Population is the basis of sampling inference, Question comes up in defining causal inference, Representativeness is also central to sampling, and Scrutiny relates to hypothesis testing and exploratory data analysis. So by mentioning PQRS, they’re really saying that AI can be improved using long-established statistical principles. Or, to put it another way, that long-established statistical principles can be made more useful through AI techniques.


  1. cat says:

    Hey wheres the cat picture dufus?

  2. Imaging guy says:

    I think multivariable analysis, machine learning/AI (advanced forms of multivaraible analysis) are mathematical modelling and are distinct from statistics per se.

  3. Tom Dietterich says:

    As an AI (ML) researcher, I was initially concerned that this article didn’t include any AI/ML people as authors or reviewers. But I totally agree with the points being made. The ML community is slowly but surely relearning the lessons that statistics researchers have learned starting with Fisher (or maybe before?). We are beginning to ask “where did these data come from?”, “How were they measured?”, “What questions are we asking of the data?”, and “Are the data relevant to answering those questions?”, “Do we really believe this fitted model?” and so on. I’m glad to see this trend unfolding.

    The article spends a fair amount of time talking about stability. The ML community knows a lot about stability, and there are many researchers applying stability ideas to improve machine learning algorithms. See, e.g., Hochreiter, S., & Schmidhuber, J. (1995). Simplifying Neural Nets By Discovering Flat Minima. In Advances in Neural Information Processing Systems (NIPS 1995) (pp. 529–536).

  4. Anoneuoid says:

    Does your method predict what will happen in the future better than the status quo or not? If so, prove it as best you can. Getting at that is the entire difference. If you want interpretability you need a mathematical/computational model. None of this has anything to do with testing whether some kind of intervention/treatment has exactly zero effect.

  5. Bob Siegfried says:

    Or, to put it yet another way, still dependent on SPQR.

  6. Bin Yu says:

    Glad to see all the comments. By now, ML and Statistics have a huge overlap. I actually consider myself both a statistician and a machine learner —
    I use whatever appropriate to solve data problems in context. Re Tom’s good point about stability in ML, indeed my 2013 stability paper referenced stability papers in ML and made connections to limit theorems like CLT as stability results, among other things.

  7. Bin Yu says:

    ps: thanks for the paper pointer, Tom. I didn’t know about this one…

Leave a Reply