Last year, we heard about “maths expert” and Oxford University prof who could predict divorces “with 94 per cent accuracy. . . His calculations were based on 15-minute conversations between couples.”

At the time, I expressed some skepticism because, amid all the news reports, I couldn’t find any description of exactly what they did. Also, as a statistician, I have some sense of the limitations of so-called “mathematical models” (or, worse, “computer models”).

Then today I ran across this article from Laurie Abraham shooting down this research in more details, so I’d share it with you.

First, she reviews the hype:

He and his colleagues at the University of Washington had videotaped newlywed couples discussing a contentious topic for 15 minutes to measure precisely how they fought over it: Did they criticize? Were they defensive? Did either spouse curl his or her lip in contempt? Then, three to six years later, Gottman’s team checked on the same couples’ marital status and announced that based on the coding of the tapes, they could predict with 83 percent accuracy which ones were divorced. . . .

“He’s gotten so good at thin-slicing marriages,” Malcolm Gladwell enthused in Blink, “that he says he can be at a restaurant and eavesdrop on the couple one table over and get a pretty good sense of whether they need to start thinking about hiring lawyers and dividing up custody of the children.”

In a 2007 survey asking psychotherapists to elect the 10 most influential members of their profession over the last quarter-century, Gottman was only one of four who made the cut who wasn’t deceased.

Then the good news:

Undeniably, Gottman has made enormous contributions to the study of marriage. . . . To back up the idea that it was the relationship that mattered, it was necessary to step into the flow, or muddle, of couples interaction–and Gottman embraced that task wholeheartedly. When he and a handful of other research teams began videotaping couples in conflict in the 1970s, the approach was revolutionary.

And now the bad news:

For the 1998 study, which focused on videotapes of 57 newlywed couples . . . He knew the marital status of his subjects at six years, and he fed that information into a computer along with the communication patterns turned up on the videos. Then he asked the computer, in effect: Create an equation that maximizes the ability of my chosen variables to distinguish among the divorced, happy, and unhappy. . . . What Gottman did wasn’t really a prediction of the future but a formula built after the couples’ outcomes were already known. . . . The next step, however–one absolutely required by the scientific method–is to apply your equation to a fresh sample to see whether it actually works. That is especially necessary with small data slices (such as 57 couples), because patterns that appear important are more likely to be mere flukes. But Gottman never did that.

Each paper he’s published heralding so-called predictions is based on a new equation created after the fact by a computer model.[emphasis added]

Hey–I think I’ve heard of that method! Whaddya know, a psychotherapist using a method guaranteed to appear successful in retrospect? Somewhere, Karl Popper is smiling ruefully.

Abraham follows up excellently with some numbers:

Then, suppose both the false-positive rate and the false-negative rate for Gottman’s equation are 20 percent (which is only an assumption, because, remember, Gottman doesn’t provide those figures; I chose it based on his assertion of 80 percent “accuracy”). False positives are couples whom the formula classifies as divorced who really aren’t, so with a 20 percent false-positive rate, Gottman would call 168* of the still-intact couples divorced (840 x 0.20). False negatives are couples who are divorced but whom the formula misses, so with a 20 percent false-negative rate, Gottman would put 32 couples in the married column who don’t belong there (160 x 0.20). In sum, Gottman would peg 296 couples as divorced–168 + (160-32), but only 128 of those actually would be, meaning his predictions would be right 43 percent, or less than half, of the time. Much less impressive.

These numbers might not be quite right–recall, these researchers don’t seem to have been coughing up any predictions–but they seem like a good start.

I eagerly await the Abraham vs. Gladwell showdown on Colbert. Could someone please tape that for me when it happens? You can record this on the same tape that already has the Bartels/Frank WWF bout, and the one where they challenge my namesake to see if he can read two full pages from his oh-so-well-reviewed opus of some years ago without the entire studio audience falling asleep. Oh, and if there’s room, you could throw in that clip of Johnny Carson and Zsa Zsa Gabor’s cat. . . .

P.S. I wanted to include this in my new “zombies” category on the blog, but we’re still having troubles with the server.

P.P.S. It’s funny that Slate put Abraham’s article in the category, “Double X: What women really think about news, politics, and culture.” I guess that makes sense: There’s Slate Magazine for what men think, then that little Slate/Double-X category for women. Sort of like the “women’s page” in old-time newspapers. Who says journalistic traditions are dead?

P.P.P.S. Yes, I know I live in a glass house; most of my collaborators are men. Still, there’s something funny about seeing a “What women really think” section in a modern-day web magazine.

What's a tape?

Of course it is much worse than her analysis. You have to look at the enormous number of variables that they use for a small sample size. Ad-hoc theories for such a number a variables (and small sample size) are *extremely* easy to do.

The comments from your last posting on this topic are very informative. In particular, this article makes all those same points. The responsible maths guy here is James Murray (formerly of Oxford, now earning a pretty pension at UW) who has done a lot of excellent work on biology but has clearly has never earned his statistics chops…

I'm shock, shocked, that Gladwell was not well-enough informed to make a comment. OTOH, his accumulated science writings would only amount to a page or two otherwise…

The real question is why only 83%? If you're going to fit the model to in-sample data, why not just use a nonparametric model (or a parametric model with 57 variables) to fit perfectly?

Afinetheorem: My guess is that the researchers weren't

tryingto cheat; they probably just didn't know better. Meanwhile, their models probably made a lot of sense, and so with the benefit of hindsight, the researchers probably felt that they really had predicted almost all these outcomes, in some sense.your "namesake"? You lost me there.

I agree, good work by Abraham. But I wish she had phoned Gottman to get his reaction. After complaining that Gottman didn't wait 6 years — the length of time it would take to test his prediction formula on a new sample — to publish his results, she publishes her book before taking 10 minutes to call Gottman.

Seth: I have no idea if Abraham phoned Gottman (or, for that matter, his renowned Oxford collaborator). My impression was that she was criticizing Gottman not for failing to wait 6 years but for either misunderstanding or misrepresenting the concept of prediction.

Seth, I tried to get an interview with Gottman last May, so it could be included in my book, but he said he was too busy to talk to me until October (at least those are the dates I recall; the precise ones are in a footnote in the book). I told him that that date was too late–my book would've already gone to press–and asked again for an interview. I didn't hear back. Laurie

Laurie, thanks for the additional info. It makes your case much stronger, in my opinion.

Gottman and co-researchers discovered what predicts if marriage ends in divorce by observing differences between marriages that last and marriages that end in divorce. Finding distinguishing characteristics was part of the research.

Wait… please excuse my ignorance, but asking a computer to "Create an equation that maximizes the ability of my chosen variables to" yield an outcome that is very close to another variable of interest (DV), sounds like what we call regression to me. If I ask it to maximize the equations' ability to discriminate between two states, we call it logistic regression. How often to scholars test their regression equation on another sample? In my experience, hardly ever. So how is what Gottman did different from someone, having run a regression, saying that his or her model predicts the DV, and reporting the r-squared as an index of how well the prediction works?

btw, anybody familiar with Losada's (1999) model on effective teams? Would his work be subject to the same criticism?

Alimba: What can I say? It just seems a bit tacky to claim "94 percent accuracy" based on a model that keeps changing. This is not to say that Gottman's work is no good, just that the numerical claims are fishy.