Bin Yu and Karl Kumbier: “Artificial Intelligence and Statistics”

Yu and Kumbier write:

Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during generation of data, development of algo- rithms, and evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS).

All 4 of the ideas in PQRS come from classical statistics: Population is the basis of sampling inference, Question comes up in defining causal inference, Representativeness is also central to sampling, and Scrutiny relates to hypothesis testing and exploratory data analysis. So by mentioning PQRS, they’re really saying that AI can be improved using long-established statistical principles. Or, to put it another way, that long-established statistical principles can be made more useful through AI techniques.

8 thoughts on “Bin Yu and Karl Kumbier: “Artificial Intelligence and Statistics”

    • Aw, poor cat; I guess you’ll just have to settle for the kitty song:

      Soft kitty, Warm kitty, Little ball of fur
      Happy kitty, Sleepy kitty, Purr Purr Purr

  1. I think multivariable analysis, machine learning/AI (advanced forms of multivaraible analysis) are mathematical modelling and are distinct from statistics per se.

  2. As an AI (ML) researcher, I was initially concerned that this article didn’t include any AI/ML people as authors or reviewers. But I totally agree with the points being made. The ML community is slowly but surely relearning the lessons that statistics researchers have learned starting with Fisher (or maybe before?). We are beginning to ask “where did these data come from?”, “How were they measured?”, “What questions are we asking of the data?”, and “Are the data relevant to answering those questions?”, “Do we really believe this fitted model?” and so on. I’m glad to see this trend unfolding.

    The article spends a fair amount of time talking about stability. The ML community knows a lot about stability, and there are many researchers applying stability ideas to improve machine learning algorithms. See, e.g., Hochreiter, S., & Schmidhuber, J. (1995). Simplifying Neural Nets By Discovering Flat Minima. In Advances in Neural Information Processing Systems (NIPS 1995) (pp. 529–536).

  3. Does your method predict what will happen in the future better than the status quo or not? If so, prove it as best you can. Getting at that is the entire difference. If you want interpretability you need a mathematical/computational model. None of this has anything to do with testing whether some kind of intervention/treatment has exactly zero effect.

  4. Glad to see all the comments. By now, ML and Statistics have a huge overlap. I actually consider myself both a statistician and a machine learner —
    I use whatever appropriate to solve data problems in context. Re Tom’s good point about stability in ML, indeed my 2013 stability paper referenced stability papers in ML and made connections to limit theorems like CLT as stability results, among other things.

Leave a Reply

Your email address will not be published. Required fields are marked *