Skip to content
 

“Everybody Lies” by Seth Stephens-Davidowitz

Seth Stephens-Davidowitz sent me his new book on learning from data. As is just about always the case for this sort of book, I’m a natural reviewer but I’m not really the intended audience. That’s why I gave Dan Ariely’s book to Juli Simon Thomas to review; I thought her perspective would be more relevant than mine for the potential reader. I took the new book by Stephens-Davidowitz and passed it along to someone else, a demanding reader who I thought might like it, and he did: he kept coming to me with new thought-provoking bits that he’d found in it. So that’s a pretty solid endorsement. I couldn’t convince him to write a review so you’ll have to take my word that he liked it.

The thing I found most appealing about the book was that, in addition to addressing interesting questions, Stephens-Davidowitz kept pulling in new sources of data. And he’s pretty good about avoiding Freakonomics/Gladwell/Ted-style hype. One reason for this might be that Stephens-Davidowitz does not set himself up as the hero of the book. If this book has a hero, it’s “the data.”

OK, don’t get me wrong: we shouldn’t worship “the data.” Crap data will give you crap conclusions (for example, search this blog for *North Carolina North Korea democracy* or *Human Development Index* or *implicit association test* or *fat arms and voting* or . . .). And even good data can require a lot of processing to be useful (search on *Xbox survey*). Measurement is important! And statistical adjustment is important! But we can learn a lot from data if we’re willing to do the work.

There were a few places where I felt Stephens-Davidowitz went beyond the data. That’s fine—I have no problem with speculation—the challenge is just to distinguish different levels of certainty so the reader won’t get lost.

I often take a skeptical perspective in my writings, especially here on the blog. Stephens-Davidowitz has much more of an open, even boosterist tone. Or, I should say, he’s skeptical about people’s stated motivations but positive on big-data research. I think this positive tone is just fine. I often express doubt but that’s not the only possible way to go. Indeed, as G. K. Chesterton could well have said, extreme skepticism is a form of credulity.

The only place where I felt that Stephens-Davidowitz went consistently too far was in his treatment of lying, perhaps because to the extent the book has a general substantive theme, it is in the title: the claim that “everybody lies”—and, by extension, we should trust what people do, not what they say.

I have mixed feelings about this perspective. On one hand, sure, talk is cheap, deeds are what count. From one direction (“talk”), as a survey researcher I have the impression that in many settings, people really do give sincere responses, and often the best way to find something out about someone is to just ask. In contrast, tricks like randomized response surveys, list experiments, and implicit racism tests can have all sorts of problems. I’m not such a fan of the psychology paradigm of trying to learn from people by tricking them into revealing their true selves. From the other direction (“deeds”), a lot of the behaviors that are easy to measure are, in a sense, as much talk as action. I’m thinking of Google searches, for example, and even certain purchases, which might have expressive value.

To get to specifics: at one point in his book, Stephens-Davidowitz writes, “it is certainly possible that lying played a role in the failure of the polls to predict Donald Trump’s 2016 victory.” I don’t think so. For a discussion of this issue, see the final paragraph of Section 5 of this paper with Julia Azari. Stephens-Davidowitz continues by talking about people lying, but in these examples I suspect that people are generally more sincere than he gives them credit for.

I’m not an absolutist on this point: indeed, near the very end of Everybody Lies is a charming example:

[Jordan] Ellenberg, a mathematician at the University of Wisconsin, was curious about how many people actually finish books. He thought of an ingenious way to test it using Big Data. Amazon reports how many people quote various lines in books. Ellenberg realized he could compare how frequently quotes were highlighted at the beginning of the book versus the end of the book. This would give a rough guide to readers’ propensity to make it to the end. By his measure, more than 90 percent of readers finished Donna Tartt’s novel The Goldfinch. In contrast, only about 7 percent made it through Nobel Prize economist Daniel Kahenman’s magnum opus, Thinking, Fast and Slow. [I hope this means they never got to the chapter on embodied cognition. — ed.] Fewer than 3 percent, this rough methodology estimated, made it to the end of economist Thomas Piketty’s much discussed and praised Capital in the 21st Century.

Kahneman is a psychologist, not an economist, but you get the idea.

Anyway, I conclude with this example for three reasons:

1. It demonstrates the synergy between new data sources and research questions. Sometimes you start with the question and see what data are available to answer it; other times you start with the data and see what questions you can answer. I actually don’t know which came first in this example.

2. Reading the story, you can think of ways the data might be answering the question in a biased way—and then you can start thinking of ways to correct the bias. These sorts of studies are open-ended.

3. One thing that frustrated me with reviews of our general-interest book, Red State Blue State, was how many reviewers were clearly writing their reviews based on the first 15 pages of the book. Next time, I realize, I have to make those first 15 pages be more clear on what’s coming next, if for no other reason than to help out the busy reviewers who will go no further.

Stephens-Davidowitz has good stuff all the way to the end. As a reader, I appreciate that he crammed his book with interesting material—even while knowing that reviewers won’t notice it. That’s good craftsmanship.

I’ll be discussing Seth Stephens-Davidowitz’s book next Tuesday in Washington, D.C.

19 Comments

  1. Jonathan (another one) says:

    An excerpt (which I just happened to read yesterday!) is online if people want to sample the style. I have no idea whether or not this excerpt is typical, but I enjoyed reading it. http://nymag.com/scienceofus/2017/05/what-the-words-you-use-in-a-loan-application-reveal.html

  2. Actually 4/5; I wish there were an edit button!

  3. Jordan Ellenberg says:

    My only complaint — but it’s a real one — is that I tried to make it crystal clear that “Piketty index” was not meant to be an estimate for the proportion of readers who finished the book! At best it’s supposed to be a function of “mean proportion of the book finished” which is roughly monotone, but even that I’d question, and of course “mean proportion of the book finished” is not at all the same thing as “proportion of readers who finished the book.”

    To be fair to SSD, he is not the only person who glossed my article that way, so the fault may be in my lack of clarity.

    • Andrew says:

      Jordan:

      Yes, that sort of thing was what I was thinking about when I wrote, “Reading the story, you can think of ways the data might be answering the question in a biased way—and then you can start thinking of ways to correct the bias. These sorts of studies are open-ended.”

    • Whoops. Sorry. I will change in revision.
      Do you think they’re that different?
      And since it’s clearly a rough estimate anyway, for obvious reasons — the best material may be in the beginning of the book — do you think, once saying it is a “rough estimate” going from “mean proportion” to “proportion of readers” is a particularly large leap?

    • Hi Jordan, sorry. I will clarify in revision.
      However, clearly this is a rough methodology — the best quotes may be at the beginning of the book.
      Once you acknowledge it is a rough methodology, do you think the leap from mean proportion to proportion of readers is a large one? that seems a tiny leap relative to the leap from quotes highlighted to pages read.
      Anyway, I am a big fan of your writing.

      Seth

    • annlia says:

      In addition to the difference between “mean proportion of readers” and “mean proportion of book finished” I have and additional doubt about how the lack of quotes from towards the end of the book should be interpreted… Maybe readers went all the way, but could not be bothered quoting later on?

  4. Rajesh says:

    Is there a difference between data and _found data_ , the focus of Stevens-Davidowitz’ book? I’ve heard this neologism get thrown around occasionally in various talks and discussions(on Electronic Health Records, for example) but I’ve never really encountered a solid formal definition of this notion.

  5. Jacob says:

    There’s definitely a difference between the idea that people lie and the fact that people are often unable to discern the true answer to a researcher’s question. I, like Andrew, doubt that lying has much to do with responses to the question of who a person plans to vote for. But you might imagine that the answers to “why did you choose _______ as your vote for POTUS?” would often be very wrong even in spite of an honest effort on the part of the respondent to give the right answer.

    • Martha (Smith) says:

      +1

      Additionally, answers to questions such as “why did you choose ____ as your vote for POTUS” can be complex; people may focus on just what first comes to mind at that time (and recall additional factors later), or may be constrained by choices given if the question is not “free-response”.

  6. Chris says:

    I read Weapons of Math Destruction last year. My opinion was that there were lots of novel and thought-provoking ideas immediately followed by absolutely unsupportable statement and extensions.

    This doesn’t seem like it’s in exactly the same vein, but may touch on similar enough issues that I’m intrigued by it. The excerpt posted by Jonathan (another one) seems like it may be a good, honest approach to discussing the consequences of working with making data-driven analysis ubiquitous.

  7. zbicyclist says:

    If you want to get the gist of a nonfiction book (aka elevator speech, central idea) that’s more likely to be found in the preface or introduction than in the weeds of the later chapters, where the central idea is dug into and proven in possibly excruciating detail. So if you want to explain the book to others, that’s where you’re most likely to find what you need.

    Life’s short. Ceteris paribus, it makes at least as much sense to read 25% of 4 books than 100% of one book.

    Isn’t that why journal articles have abstracts?

    • Martha (Smith) says:

      “If you want to get the gist of a nonfiction book (aka elevator speech, central idea) that’s more likely to be found in the preface or introduction than in the weeds of the later chapters, where the central idea is dug into and proven in possibly excruciating detail. So if you want to explain the book to others, that’s where you’re most likely to find what you need.”

      +1

    • Alex F says:

      Stop it! You’re biasing the misinterpretation of Jordan’s index as an estimate of the proportion of readers who finish a book!

  8. Harald Korneliussen says:

    Just a hunch, but I now suspect that Donna Tartt’s novel The Goldfinch has some very memorable line near the end.

  9. David Pittelli says:

    Isn’t it equally plausible that some authors put their good quotes in the beginning of the book, and some put them near the end?

  10. Steve Sailer says:

    “One thing that frustrated me with reviews of our general-interest book, Red State Blue State, was how many reviewers were clearly writing their reviews based on the first 15 pages of the book.”

    Paul Newman said: “The first 15 pages of your screenplay are what sells your script to the studio, but the last 15 pages are what sells the movie to the audience.”

    • Andrew says:

      Steve:

      Sure, you’d say that: in Red State Blue State, you’re mentioned on page 170, which just happens to be in the last chapter of the book, less than 15 pages from the end!

      I gotta update the book, though, as income-and-voting patterns have changed a lot since 2008.

Leave a Reply