Following up on our discussion the other day, Andrew Ng writes:
Looking at the “typical” ML syllabus, I think most classes do a great job teaching the core ideas, but that there’re two recent trends in ML that are usually not yet reflected.
First, unlike 10 years ago, a lot of our students are now taking ML not to do ML research, but to apply it in other research areas or in industry. I’d like to serve these students as well. While many ML classes do a nice job teaching the theory and core algorithms, I’ve seen very few that teach the “hands-on” tactics for how to actually build a high-performance ML system, or on how to think about piecing together a complex ML architecture. For example, what sorts of diagnostics do you run to figure out why your algorithm isn’t giving reasonable accuracy? How much do you invest in collecting additional training data? How do you structure your org chart and metrics if you think there’re 3 components that need to be built and plugged together? How do you trade off iterating quickly vs. more careful design, and when is which approach superior? I see many graduates of ML classes (including mine) make mistakes on such questions when they try to apply our various mathematically beautiful algorithms, and I believe our students would be served well if we spend more time teaching them more of the practicalities of applying ML, even at the expense of learning about fewer algorithms. Even if their goal is to do ICML/NIPS-worthy research, this will be useful to them IMHO.
Second, I think most ML classes have been slow to appreciate the rise of Big Data. (Surprisingly, I find that even some classes titled “Big Data” are still slow to appreciate the rise of Big Data.) The volume, scale, magnitude of data that we all have access to now is completely unprecedented, and a lot of Silicon Valley’s ML advances have been because of this. (For example, the single most commonly used learning algorithm in the Valley is probably logistic regression, only applied at massive scale.) This rise of data has led to a parallel literature on data warehousing, and tools like Hadoop/Hive/Storm/Kafka/AWS/… for exploiting this data. The way you think about obtaining and training on 1B examples is very different than the way you think about training on 10K examples, and it goes beyond the algorithmic questions like online vs. batch, into computer systems issues, hardware constraints, and questions like how to plan for polyglot persistence. All of us will have to make a decision as to whether we think of all this as part of ML or as outside ML, but I think the proximate reason for the rise of Deep Learning is that it has a compute/approximation/hardware/runtime tradeoff that happens to make it one class of algorithm (not the only one) that’s able to effectively exploit our ever-growing datasets. I think this will become even more true over time.