Machine learning and statistics

Posted on December 3, 2008 4:16 PM by Andrew

Aleks pointed me to this article by Brendan O’Connor comparing the fields of machine learning and statistics. I don’t have much to add, except to say:

1. Healthy competition is fine, but I think it’s great to have separate fields of statistics and machine learning. It’s like having more space on the supermarket shelf to sell our common product. In general, I’d like to see more statistics, CS, applied math, etc., and less pure math. Pure math is fine but I think it occupies way too many resources and draws way too many students, given what comes out of it.

2. My impression is that computer scientists work on more complicated problems and bigger datasets, in general, than statisticians do. That’s fine–each of us has our niche–but I think our machine learning cousins deserve our respect for being able to make progress on hard problems.

4 thoughts on “Machine learning and statistics”

Brendan O'Conno on December 7, 2008 12:42 AM at 12:42 am said:

Hey, thanks for the comments. I just posted back there: I'd love to see positive arguments why statistics is an awesome discipline. At the very least, who else knows how to design a good experiment? I don't know, there's something I hope …
Andrew Gelman on December 7, 2008 7:08 AM at 7:08 am said:

Brendan,

First off, it's probably good to get a sense of the diversity of statistics as a discipline. For example, one of your commenters links to a paper by Leo Breiman, who did some great work but was also notoriously ideological about statistical methods; see here. Breiman once told me in all seriousness that he didn't approve of an exam question asking students to construct and evaluate a forecasting model because, at Berkeley, "we don't believe in models."

Statistics is a lot like computer science in that it's relatively easy to come up with methods and to make progress on applied problems. As a result, successful statisticians (and computer scientists) often attribute virtues to their methods which perhaps would be better attributed to more general aspects of their method being able to make use of available data. (I raise this issue in this discussion.)

Regarding your request for "positive arguments why statistics is an awesome discipline," I think examples are the best selling point. I'd start with my own four books (see links at the upper right of this blog); beyond that, I'm a big fan of Bill James's baseball abstracts from the 1980s. I also have a bunch of my own articles on the web. For experimental design (which I don't really know much about), you can take a look at the book by Box, Hunter, and Hunter.
Simon Blomberg on December 17, 2008 3:00 PM at 3:00 pm said:

From R's fortunes package:

To paraphrase provocatively, 'machine learning is statistics minus any checking of models and assumptions'.
— Brian D. Ripley (about the difference between machine learning and statistics) useR! 2004, Vienna (May 2004)

:-)

Season's Greetings!

Simon.
Andrew Gelman on December 18, 2008 12:23 PM at 12:23 pm said:

Simon,

In that case, maybe we should get rid of checking of models and assumptions more often. Then maybe we'd be able to solve some of the problems that the machine learning people can solve but we can't!

Comments are closed.