Melanie Mitchell says, “As someone who has worked in A.I. for decades, I’ve witnessed the failure of similar predictions of imminent human-level A.I., and I’m certain these latest forecasts will fall short as well. “

Melanie Mitchell‘s piece, Artificial Intelligence Hits the Barrier of Meaning (NY Times behind limited paywall), is spot-on regarding the hype surrounding the current A.I. boom. It’s soon to come out in book length from FSG, so I suspect I’ll hear about it again in the New Yorker.

Like Professor Mitchell, I started my Ph.D. at the tail end of the first A.I. revolution. Remember, the one based on rule-based expert systems? I went to Edinburgh to study linguistics and natural language processing because it was strong in A.I., computer science theory, linguistics, and cognitive science.

On which natural language tasks can computers outperform or match humans? Search is good, because computers are fast and it’s a task at which humans aren’t so hot. That includes things like speech-based call routing in heterogeneous call centers (something I worked on at Bell Labs).

Then there’s spell checking. That’s fantastic. It leverages simple statistics about word frequency and typos/brainos and is way better than most humans at spelling. It’s the same algorithms that are used for speech recognition and RNA-seq alignment to the genome. These all sprung out of Claude Shannon’s 1948 paper, “A Mathematical Theory of Communication”, which has over 100K citations. It introduced, among other things, n-gram language models at the character and word level (still used for speech recognition and classification today with different estimators). As far as I know that paper contained the first posterior predictive checks—generating examples from the trained language models and comparing them to real language. David McKay’s info theory book (the only ML book I actually like) is a great introduction to this material and even BDA3 added a spell-checking example. But it’s hardly A.I. in the big “I” sense of “A.I.”.

Speech recognition has made tremendous strides (I worked on it at Bell Labs in the late 90s then at SpeechWorks in the early 00s), but its performance is still so far short of human levels as to make the difference more qualitative than quantitative, a point Mitchell makes in her essay. It would no more fool you into thinking it was human than an animatronic Disney character bolted to the floor. Unlike games like chess or go, it’s going to be hard to do better than people at language, but it would certainly be possible. But it would be hard to do that the same way they built, say Deep Blue, the IBM chess-playing hardware that evaluated so many gazillions of board positions per turn with very clever heuristics to prune search. That didn’t play chess like a human. If the better language was like that, humans wouldn’t understand it. IBM Watson (natural language Jeopardy playing computer) was closer to behaving like humans with its chain of associative reasoning—to me, that’s the closest we’ve gotten to something I’d call “A.I.”. It’s a shame IBM’s oversold it since then.

Human-level general purpose A.I. is going to be an incredibly tough nut to crack. I don’t see any reason it’s an unsurmounable goal. It’s not going to happen in a decade without a major breakthrough. Better classifiers just aren’t enough. People are very clever, insanely good at subtle chains of associative reasoning (though not so great at logic) and learning from limited examples (Andrew’s sister Susan Gelman, a professor at Michigan, studies concept learning by example). We’re also very contextually aware and focused, which allows us to go deep, but can cause us to miss the forest for the trees.

37 thoughts on “Melanie Mitchell says, “As someone who has worked in A.I. for decades, I’ve witnessed the failure of similar predictions of imminent human-level A.I., and I’m certain these latest forecasts will fall short as well. “

  1. Great piece. Not being a information scientist, but having a background in analytic philosophy, I have long suspected that the current hype about AI was just that — hype. I have also always wondered why AI researchers don’t model something simpler like lower primate communication. (Maybe they do.) After all, we are the product of millions of years of evolution, and communication using language seems like the last step. My cat and dog have a better understanding of emotional meaning behind the words I use, than my Alexa, and an autistic with normal intelligence can get so confused by basic interactions even when they have full comprehension of the language being used. Some other system is operating underneath that must make my understanding of the language easier to process. Is anyone doing that? Trying to build an “AI” that will correctly understand the emotional context of a communication interaction?

    • You make a good point that linguistic meaning has probably evolved on top of emotional meaning, but asking AI to therefore start with emotional meaning feels to me like suggesting that a child learn the set theoretic construction of natural numbers before learning to count.

  2. Artificial Unintelligence: How Computers Misunderstand the World, by Meredith Broussard, MIT Press, 2018, is an excellent work along the same lines. However, while many of the applications and dangers of AI or over-hyped, I think others are underappreciated. AI works best the more predictable our behavior – and the same is true for many technologies. I believe there is a tendency for humans to become more predictable the more we use these technologies. After all, asking Siri or Google gets old quickly if we can’t be understood. Similarly, search works better when we behave more the way the algorithms expect us to behave. I do worry that we become ever less human, bit by bit (pun intended).

    • What a coincidence She’s speaking at the Columbia Data Science Institute in 30 minutes and I have a conflicting appointment! The talk’s being livestreamed if anyone’s interested:

      DATA FOR GOOD: Meredith Broussard

      “Technical Complexity and Public Discourse: Why It’s So Hard to Write About Data Science”

      Friday, November 9, 2018
      1:00PM – 2:30PM
      PUPIN 428

      Live stream: https://columbiauniversity.zoom.us/s/384999682

      ABSTRACT: Data science is rarely the subject of in-depth reporting in mainstream publications. Part of the problem: it is extremely difficult to communicate complex technical ideas in plain language to non-experts. In this talk, Meredith Broussard will discuss strategies for communicating with the public with and about data science. Building on the ideas in her new book “Artificial Unintelligence: How Computers Misunderstand the World,” she will explain how data journalism can provide guidance to data scientists who seek a wider audience. She will also discuss the limits of computing, arguing against a kind of bias she calls “technochauvinism.” Understanding and articulating the limits of what we *can* do with technology leads to making better choices about what we *should* do with it to make the world better for everyone.

      BIO: Meredith Broussard is an assistant professor at the Arthur L. Carter Journalism Institute of New York University, an affiliate at the Moore Sloan Data Science Environment at the NYU Center for Data Science, a 2019 Reynolds Journalism Institute Fellow at the University of Missouri, and the author of “Artificial Unintelligence: How Computers Misunderstand the World.” Her research focuses on artificial intelligence in investigative reporting, with a particular interest in using data analysis for social good. Her newest project explores how future historians will read today’s news on tomorrow’s computers. A former features editor at the Philadelphia Inquirer, she has also worked as a software developer at AT&T Bell Labs and the MIT Media Lab. Her features and essays have appeared in The Atlantic, Harper’s, Slate, and other outlets. Follow her on Twitter @merbroussard or contact her via meredithbroussard.com.

  3. I actually strongly disagree with the article. My first argument is that I don’t think we often understand that well our own reasoning or thoughts when we make some decisions. I.e. when we look at human faces or hear sounds, we won’t in general be able to explain why we think this is the face of one person or another or the voice says one phrase or another. In the same category is driving a car. In general we won’t be able to explain all the minute movements of the steering wheel/pedals, as many things are done automatically without much thinking. Humans are also subjects to various illusions — blue/white dress, or yanny/laurel. I interpret those in the same fashion as the experiments tricking the neural networks (by adding random noise or changing background color). Just now the current neural nets are still much simpler what our brains are, but the effect is the same.

    But it is true there is a long way to go and (in my non-expert opinion) there is a bit too much hype about AI.

    • I didn’t take Mitchell to be saying that we understand how our own reasoning or thoughts work. The general term for the kind of things you’re mentioning in cognitive psychology is “tacit knowledge.”

      But what we do know is the kind of mistakes people make. And humans don’t make the kinds of mistakes Mitchell was describing the machines making. Studying mistake patterns is really interesting in psychology. Adrian de Groot did some of my favorite experiments in cognitive psych when he had chess masters and amateurs look at a board quickly and then try to recreate it from memory. Herb Simon later came in and did some further work on the error patterns, showing that amateur’s mistakes tended to not leave the board in the same strategic position. Here’s a neat blog post on the topic.

      I agree that tricking one of those home spy networks with audio encoding was a red herring in the article. But the rest of the points about the oddness of the machines mistakes is not. They’re clearly getting very confused because they can’t isolate the relevant information. Humans are very good at focusing on relevant details and separating the noise from the signal (there are cases where computers are better than humans at noisy signal extraction—that’s a kind of spell-check like problem). As a result, the neural nets and machine translation programs aren’t even making the same kinds of mistakes as human beginners. They make odd mistakes that humans would never make.

      • ” Humans are very good at focusing on relevant details and separating the noise from the signal ”
        Far too simplistic. Did anyone see the gorilla? (http://www.theinvisiblegorilla.com/videos.html) Or, any of the myriad examples from marketing where people’s attention is directed away from the important details to noise (that is important to the marketer, of course). So, at the very least, you should say humans are very good at focusing at certain types of relevant details (and not others). The question, then, is which types?

        • Exactly—we’re very good at focusing on the task at hand and ignoring irrelevant details. The gorilla task, if I recall, was counting something like basketball dribbles or passes or something. They didn’t ask the humans to watch to see if a gorilla came on the court. The humans stayed focus on the assigned task and missed extraneous details. That’s what I meant when I said human focus will cause us to miss the forest for the trees. Humans just aren’t very good at breadth—we don’t have enough memory or concentration bandwidth.

          The current crop of A.I. systems get confused and start making odd, inhuman mistakes when extraneous details get thrown in. Maybe better adversarial training will help sort this out, but the combinatorics are daunting. Where I think they’ll succeed is in getting better than human at picking up extraneous pattern-matched details, like quickly concluding there are 123 butterflies and two moths in a photo or sorting out a license plate from a very grainy photo and a database of possible license plates. I believe where we’ll see improvement in A.I. is precisely in the areas machines are good at, like considering a whole lot of options in a short span of time.

        • Perhaps we should think a bit ahead. Since humans will need to program computers to build the algorithms, there will be a value (meaning monetary value, not necessarily social value) in providing the patterns of noise that can distract AI from meaningful signals. So, if an AI is better at picking out particular patterns that confuse humans, patterns can be created/distorted to fool it into picking the “wrong” patterns, since it will be a human that creates the AI to distinguish between right and wrong patters. Isn’t it possible (likely?) that the more we perfect AI, the more vulnerable we make it to AI manipulation?

  4. Electronic devices can beat humans at chess and Jeopardy. They can stay active (=alive?) a lot longer than humans even in hazardous places like on Mars. Human intelligence is certainly a product of our natural roots and evolutionary processes. However, there may well be other kinds of intelligence, and a sufficiently complex device might develop a form of intelligence that we will won’t recognize. “If a lion could talk, we wouldn’t understand him.”

    • Right. There are lots of things that computers are way better at than humans. The focus here is on general-purpose intelligence. So far, we don’t have anythng like that in A.I.

      That’s what I meant by saying that Deep Blue (chess) doesn’t play chess like a human. The architecture of Watson (Jeopardy) was much closer, I think, to how humans reason.

  5. One of the things left out of the discussion is that computers, as they are currently built, are physical manifestations of logically consistent systems. At their base, they’re just logic gates. We know from Turing, Goedel, etc. that logically consistent systems have their limits. I wonder sometimes if building “human level AI” isn’t more a matter of hardware than software.

    • I don’t think humans can solve the problems that are beyond mechanical computing devices. That is, I think the halting problem (Turing), a computational formlization of the same concept as incompleteness (Gödel), is going to defeat humans, too.

      You’ll find a sympathetic philosopher in John Searle, who seems to (philosophy is hard) argue there’s something intrinsic in the meat. Just grab any introductory philosophy of mind textbook and you can get the whole run from Cartesian dualism to more modern functionalism (hey, they may even include pragmatism now—I haven’t taught this material in 20 years).

      • Interesting! Based on your first line, is it safe to say that you think human thought/reasoning is no better than mechanical computing devices? (Note: I’m trying to trap you here, in a good-natured way, and I’m curious to see what your response is.)

        Personally, I’m still of the opinion that logic is a creation of the human mind and, thus, a subset of the possible ways we can think. I’ll check out John Searle, if I can wade through his writing (I do agree that philosophy is hard). Thanks for the reference!

  6. I think we’ll come up with a range of legitimate AIs that are to human cognition as airplanes (and our other flying machines) are to bird flight. Planes are not as nimble or adaptable as birds, but they are much faster and more powerful for a narrow range of important tasks, and what planes do is certainly legitimately considered flight. Similarly, I think we’ll be able to make AIs that legitimately perform “cognition” much more powerfully than humans, but in narrow yet still useful contexts. For example, we will surely not have an AI that replaces a human doctor any time soon, but perhaps we can make multiple AIs that can each reason about limited subsets of medical decision making. Just as we trust humans to decide not to fly airplanes in storms because they don’t work well in that context, there will be certain patient situations where humans will be trusted to decide not to use these limited scope AIs.

    • I think we already have all that in current-day A.I. with chess playing machines, search, etc. Speaking of planes, autopilots are pretty good these days.

      The point of the article was a failure of general-purpose A.I. and what would constitute actual understanding (which veers off into the philosophical deep waters of epistemology, but I think it has a reasonable naive interpretation—the chess-playing computer can’t play Go and vice-versa.

      • Right, I agree that chess playing machines don’t truly reason or understand. I was saying it might be possible to construct something that might legitimately be called understanding within narrow domains, but I don’t have a great definition of understanding and it’s possible that very wide generalizability is an important criterion. (I’m pretty sure I have understanding but suck at both chess and go.)

  7. The examples in Mitchell‘s article (uncertainty/overconfidence of out-of-sample predictions, non-robustness of NNs to input perturbations) remain mostly technical level. I bet there are new arxiv papers every single day trying to address these issues. A model never fails silently. It is lack of appropriate model evaluation and model diagnostics that make everything unreliable.

    • I think the point’s about general-purpose intelligence—that’s not just out-of-sample prediction. It’s new tasks.

      I completely agree there are armies of Ph.D. students, postdocs, faculty, and industrialists working on neural nets. It’s pretty much taken over machine learning (at least in natural language and speech rec).

      I don’t know what you mean by models never failing silently. Some software fails silently. We should talk more because I want to submit a grant proposal on trying to evaluate ML systems using statistics. That will definitely involve figuring out model evaluation and diagnostics that make sense. But it won’t tackle the problem of general-purpose, human-like A.I.

    • I bet there are new arxiv papers every single day trying to address these issues.

      Has anyone had success reading academic papers on machine learning? I have found very little use for them, it’s just the wrong format. A github and blog post are far superior. Usually any useful/interesting theoretical stuff in the paper can be summed up in a few sentences and the rest isn’t really clear without accompanying code and dataset.

  8. Love the article. Some reality check in what seems to be wild AI and ML of today.
    How much of the hype on machine learning and artificial intelligence is really justified?
    Is ML and AI turning in a sort of modern witchcraft where most of the people do not understand how it works hence think it can do human-like things?
    Why the obsession of representing everything that is AI or ML with humanoid lifeforms? Or mechanical arms? Or brains?
    Do people really think that AI can reach consciousness? How come people that can buy any knowledge (billionaires) think that AI could prescind of humans?
    Do you know that if you vary a bit of the data, or introduce an anomaly to the AI model, it would stop working?

    Something that has to be abundantly clear by now is that most of the progress is being made on algorithms, statistical models, computer science, and sheer computational power. There is no organic or brain-like functions in the pipeline. And we all know that because there are no neuroscientists working on practical AI; there is no organic matter performing the machine learning stuff at all. It is all silicon chips.

    • Cool. I somehow missed the one from January. Great blog post with a very long comment thread (I hope that’s not where this one’s heading—I don’t have time for this!). Rather than trying to summarize here, I’ll just add my own recommendation to go read it.

      P.S. I want to join the “Toronoto Semiotic Circle”! I miss thinking about philosophy and semantics.

    • I think the problems caused by AI are something to be considering even if we’re nowhere near general purpose AI. The top 10 most common jobs are things like retail salesperson, cashiers, fast-food workers, freight handlers, etc. Almost all of those jobs could easily be replaced with special purpose “AI”. The social upheaval of rapidly putting say 1/3 of currently employed people out of work in an economy built entirely around the idea of “get a job or you’re worthless” by itself could be catastrophic.

      Not that I’m in favor of keeping people employed flipping burgers and unpacking cans onto shelves and swiping cans across laser scanners… More that I think we need very very different social policy, starting with ultra-simplified flat tax + UBI, decoupling healthcare from employment, and eliminating special purpose government programs in favor of bigger UBI and a small set of special purpose government programs for mentally ill people. Those things are all true even without massive special purpose AI automation, but they’re especially true with AI automation.

      • We already have replaced a lot of cashiers with ATMs (hardly A.I.) and with a lot of self-serve scanners, call center workers with voice-activated systems, factory workers with robots, etc.

        I see big dangers lurking from computer technology, but they don’t stem from human-level general-purpose A.I.

        The first thing I worry about is the surveillance state. (Cory Doctorow’s novel, Little Brother is an interesting take on this, by the way.) I worked briefly on the Total Information Awareness project in the early 00s until it was defunded by congress. Now that companies like Amazon are helping the DoD with this stuff and the technology’s gotten so much better, I’m really worried.

        The second thing I worry about is military applications. When I was working on a DARPA project on speech recognition and the scientific director was replaced by a lieutenant colonel, he came into Bell Labs and told us how voice activation will allow the “warfighter of the future” to point at a spot and speak into a headset, and bam, metal drops from the sky to kill things. I quit the project after that, but it wasn’t like the project was scuttled.

        Now I just work on stats and have tried to keep us the hell away from DARPA.

        • Sure automation has replaced a bunch of workers, but automation that can do things like recognize objects and manipulate them in useful ways could replace a LOT more jobs. Self checkout works for small baskets of goods, it’s a disaster for a 4 person family’s groceries. But a robot that can grab objects off a conveyor, scan them and pack them effectively is totally reasonable to expect soon, the visual recognition is a traditional AI topic. But it’s not human level or general purpose.

          Amazon is trying out their cashierless stores, that use a variety of techniques. Amazon warehouses already sound like they are largely automated in terms of robots grabbing items to bring them for boxing. A boxing and shipping robot that can use visual recognition to choose how to pack a box and seal it seems likely.

          Even deepfakes suggests we might see high quality acting replaced by a mix of low paid body doubles, and computerized postprocessing.

          Military applications are certainly problematic.

          For all these things to cause social chaos requires only sufficient skill to do little specialized tasks where we currently use humans, it doesn’t require general purpose ai.

  9. Mitchell’s piece presents good evidence that we have a problem. But the nature that problem and the proposed solution are unclear. What do the words “meaning”, “common sense”, and “understanding” mean? How would additional study of human cognition answer those questions? Critics of the current machine learning wave in AI consistently propose these kinds of abstractions as the way forward. Gary Marcus says we need to capture “ideas”. Hey, that’s great. But what exactly is an “idea”?

    My view is that “common sense” and “understanding” can only be measured functionally. Google (or Watson) understands my question if it gives me the right answer. It has “common sense” if its answers are consistent with our common knowledge of the world.

    But “meaning” is a tough nut to crack. Since its beginning, AI has generally viewed “meaning” as an encoding in some kind of declarative model, either in the form of logical axioms, relational assertions, or their probabilistic counterparts. But attempts to build such models by hand have failed, and it is extraordinarily difficult to achieve decent inter-rater agreement on translations of natural language sentences into formal meaning representations. The current deep learning strategy is to shower the neural network with data and hope that it self-organizes into a useful meaning representation. This works to some extent: word embeddings in high dimensional space have proved to be very powerful. But their failures are also obvious. My guess is that we need some combination of structuring and self-organization, and I still am placing my bet on a data-driven approach.

    It is surely useful to study human cognition, but I don’t see why we should expect machine “cognition” to be similar to human cognition. Machine theorem proving is very different from human theorem proving. Solvers for linear programs, algorithms for symbolic integration, etc. etc. are all very non-human in their operation. Conversely, studying machine “cognition” may not help use understand the human mind or brain at all. So while I (and most ML researchers) share the diagnosis of Profs. Mitchell and Marcus, we don’t place much hope in their proposed therapies.

  10. Some devices like Alexa may be able to pass the Turing test, but are they really conscious? When you talk to Alexa it offers a multitude of canned responses – it’s not really “thinking”.

    In truth, I still don’t understand why Alexa has become so popular. Maybe in part because it’s yet another fancy new gimmick like you used to see in Sharper Image catalogues. I can’t see any practical uses in my own life for Alexa.

Leave a Reply

Your email address will not be published. Required fields are marked *