## More on those divorce prediction statistics, including a discussion of the innumeracy of (some) mathematicians

A few months ago, I blogged on John Gottman, a psychologist whose headline-grabbing research on marriages (he got himself featured in Blink with a claim that he could predict with 83 percent accuracy whether a couple would be divorced–after meeting with them for 15 minutes!) was recently debunked in a book by Laurie Abraham.

The question I raised was: how could someone who was evidently so intelligent and accomplished–Gottman, that is–get things so wrong? My brief conclusion was that once you have some success, I guess there’s not much of a motivation to change your ways. Also, I could well believe that, for all its flaws, Gottman’s work is better than much of the other research out there on marriages. There’s still the question of how this stuff gets published in scientific journals. I haven’t looked at Gottman’s articles in detail and so don’t really have thoughts on that one.

Anyway, I recently corresponded with a mathematician who had heard of Gottman’s research and wrote that he was surprised by what Abraham had found:

I [the mathematician] read one of his books a while ago — not about this predictive stuff — and found the level of mathematical sophistication quite high. I’m not quite sure what to think: to be honest, given what I know of Gottman, it’s really hard to imagine him making such a mistake! Anyway, it’s terrific math journalism that Abraham dug this up though.

I think she drops the ball, though, in the section you quote about false negatives and false positives. True, Gottman could mean a lot of things by “80% accuracy.” But if he’s trying to predict early divorce, an event with 16% prevalence, I think it’s safe to say he does NOT mean “My method assigns a prediction of divorce to 20% of couples that didn’t actually divorce” — presumably the TOTAL number of positives his method gives is somewhere around 16%. A much more natural interpretation of what he might mean by “80% accurate” is “the method gives the the right answer in 80% of cases” — of course, it would be richer information to report the false positive and false negative rate separately.

I do think that Abraham was winging it with her numbers, but I can’t say I’ve heard anything positive about Gottman’s claims of 85% success etc. I haven’t looked into this in any detail and would be happy to be proved wrong on this, but right now it doesn’t look so good to me. If you know more on this, please keep me informed.

The mathematician then replied to me:

A psychologist colleague convinced me my initial reaction was too hard on Gottman. I looked at the actual papers a bit; it’s a bit hard for me to make out exactly what he did (but I think this is just because I don’t ever read psychometric stuff) But here’s my general sense of things, which I think places Gottman in a better light without anything Abraham says being factually wrong. (But again, the following is my very casual understanding of what happens in Gottman’s papers, and could be inaccurate.)

So in the first paper (where the 80% figure comes from) he videotapes the couples and measures lots of kinds of interactions. He takes some smallish set of measured variables x_1, …. x_n which represent one hypothesis about predictors of divorce, and then another set y_1, … y_n which represent a competing hypothesis (the one he favors.) He finds the optimal linear combination of x_1 … x_n for predicting divorce, and similarly for y_1, … y_n. (I guess this amounts to something like the usual problem where you have a bunch of red dots and a bunch of green dots in R^n and you want to find the best separating hyperplane.) And he finds that y_1, … y_n do much better.

He doesn’t report rates of false positives and false negatives separately, but he does report correlation between his linear combination of y_1, … y_n and (I think) the Bernoulli variable of divorce/not divorce, and gets something highly positive (with x_1, … x_n, it’s just slightly positive.) So that’s not consistent with, say, his model always predicting no divorce, or predicting that a randomly chosen 16% of the couples will divorce, which would give correlations of 0. It IS consistent, I think, with numbers like Abraham’s, in which half of the couples he predicts will divorce actually stay together. But as I said, I don’t think this is an unimpressive result!

Anyway, in his later papers, it looks like he carries out the same kind of analysis on different data sets. I think what Abraham is complaining about is that he recomputes the coefficients of y_1, … y_n in each paper. But I don’t think this really counts as remaking the model each time to fit the data; he has committed to this small set of variables, and replicates the finding that these variables explain more of the variation in some variable of interest than do competing sets of measurements. That’s legitimate, right? (That’s an authentic question, I’m a non-statistician.) Of course, it would be weird if the coefficients were completely different each time, but one presumes he checks that. I’m just saying that it doesn’t seem to be methodologically necessary that he commits himself after his first paper to a specific linear combination of y_1, … y_n which is to be his divorce prediction variable for all time.

Two remarks:

1. A relevant question is: if you have a binary variable, and n other measurements, presumably you should expect to be able to find SOME linear combination of y_1 … y_n that has a healthy positive correlation with the binary variable. But how positive a correlation does one expect to find, for a given n, under a null hypothesis that the y_i are actually independent from the binary variable? That question seems relevant (and I get the sense from what Gottman writes that he knows the answer to this question, but I don’t. Part of the problem I guess is that his y_i are presumably correlated with each other and I guess that affects the answer.)

2. I guess my sense after looking at this stuff is that there’s really nothing wrong with Gottman’s published work, but that there is something wrong with Gladwell’s description of it, and to be honest, probably something wrong with Gottman’s own non-academic sales pitch for his own results. I think the criticism would be much muted if he said some form of “these variables explain x% of the variance” instead of using the loaded word “predict” — but my colleague tells me the word “predict” is standard usage for this, at least among psychologists. I think the correct statistical critique is just the very basic one that the claim as written is empty: that if you want to predict whether a couple will divorce or stay together with 80% accuracy, you can just predict every couple will stay together. But I think Abraham goes too far when she says that Gottman’s work is somehow in conflict with the scientific method.

To which I replied:

There’s a kind of innumeracy you sometimes see with mathematicians, where they don’t connect numbers directly to real-world outcomes.

To a statistician, it’s obvious that you can predict divorces to high accuracy over a 3-yr period by just saying that everyone will stay married, and those numbers such as 80% or 93% will raise suspicion. But to a certain kind of mathematician, these are numbers with no real-world context. Recall this quote from Gottman’s collabroator, James Murray:

“The forecast of who would get divorced in his study of 700 couples over 12 years was 100 per cent correct, he said. But “what reduced the accuracy of our predictions was those couples who we thought would stay married and unhappy actually ended up getting divorced”.”

To this guy, you can do better than 100%!

To put it another way: Gottman’s work could very well be useful, even if he doesn’t predict divorces at all, he could be developing good techniques for marriage counselors. After all, most counselors don’t try to predict at all, and we don’t consider that to be a problem. But, at the very least, Gottman doesn’t seem to mind the praise he’s received for this work.

I can’t really say more without actually reading the scientific publications. . . .

1. Mike Maltz says:

Gottman should have used survival analysis.

2. Sebastian says:

I think what Abraham is complaining about is that he recomputes the coefficients of y_1, … y_n in each paper. But I don't think this really counts as remaking the model each time to fit the data

I'd certainly think it does!
Let's say you had a model predicting that South American countries would do really poorly in the current soccer world.
And now, as the group phase nears its end, you'd trumpet the predictive success of your model. You just got the coefficient "a little" wrong.

I'd guess that Gottman's coefficients keep the same sign, but allowing people to fix coefficients post-hoc makes "prediction" a little silly.

3. http://models.street says:

I haven't read the paper involved, but based on your conversation I feel vindicated in my personal belief that it takes all kinds to build mathematical models, and that "mathematicians" generally are the last people you should go to in order to get started.

I think for many it's surprising to think that a mathematician might be a bad choice to figure out a mathematical model. Most people after all don't really know what mathematicians do so they naturally assume that if you have a "mathematical model" then you need a "mathematician". Even in the sciences or social sciences I think there's a tendency to believe that there is little difference between a statistician and say a mathematician that studies probability theory. This is a serious mistake.

As Andrew has said, statistics is the conglomeration of measurement, variation, and comparison. Statistical models of variation inevitably use probability theory, but measurement and comparison almost never enter into the world of mathematicians that I know, and sophisticated concepts in probability theory are rarely needed, and usually only by people who work on computational methods or something like that.

Even people who consider themselves applied mathematicians are often simply interested in proving the existence of a solution to some equation brought to their attention by a physicist or an engineer or something like that.

There are a few exceptions, people who are naturally more application oriented and interested in getting their hands on data, and in some cases entire schools of truly applied mathematics (such as the Oxford Centre for Industrial and Applied Mathematics and various other groups) where people actively seek out applications and then build models from scratch.

That a mathematician would think that recomputing the coefficients post-hoc doesn't count as "remaking the model" just shows how different the worlds of mathematicians and data-grubbing statisticians are.

To the mathematician the question of interest seems to be "show that y_i forms a subspace within which the optimal separation hyperplane lies" and to the statistician the question is something like "find a set of coefficients that optimally estimates the probability that a given couple not included in the training sample will divorce".

4. K? O'Rourke says:

Mike – agree but also how poorly statistics is explained/grasped and that although some mathematical reasoning is neccessary even research level mathematical reasoning is not enough on its own.

One would think that cross-validation concepts would help "probability that a given couple not included in the training sample will divorce".

But I also remember Brian Ripley admitting in a talk that it was a long while before he realized why cross-validation (or local model validation) on its own is simply not enough.

K?
p.s. some of the most troublesome consults I have been involved in – involved individuals who had done well in numerous undergraduate math courses.

5. Bernard Guerrero says:

"Gottman should have used survival analysis."

Logistic regression would be a more tractable choice, no?

"That a mathematician would think that recomputing the coefficients post-hoc doesn't count as "remaking the model" just shows how different the worlds of mathematicians and data-grubbing statisticians are."

Well, we're talking about the difference between re-running regressions with a given specification on new data-sets vs. re-specifying the model. If he's consistently keeping the same set of independent predictor variables, I don't think I'd call it "remaking" the model, either.

6. Jerzy says:

I haven't read Gottman's papers in detail either; but based on the above, it sounds like he's just modeling — there's no actual "prediction" happening in the every-day sense of the word.

If so, Andrew's mathematician-friend's comment ("I don't think this really counts as remaking the model") makes me sad when they talk about actual prediction. You simply can't fit the model to your data and then say you've "predicted" that data!

However… Who actually needs, as such, a magic equation that tells you *whether* you'll divorce, assuming no further intervention? The *real* benefit, assuming Gottman's individual studies have been done well, is that now there exists evidence-based guidance for noticing problem behaviors you should work on to hopefully *avoid* divorce.

I don't really care whether Gottman's 80% prediction claims are *true*. But I do think it'd be a better use of his time/effort to stop making them and to popularize his *actual* useful conclusions — and, better yet, to do new studies to test which treatments assigned to remedy problem behaviors actually reduce divorce rates relative to an untreated control group.

Also, regarding Daniel's comment about mathematicians vs math modeling: I agree, and that makes me glad I got a degree in Statistics instead of Applied Mathematics or Computer Science. It was worthwhile having teachers force me to grasp the statistical worldview of variation and risk. If I need to, I can pick up particular applied math techniques (diff eq's, advanced computational tools, etc) later on my own from a textbook or website MUCH more easily than I could "pick up" the statistician's worldview the same way.
On the other hand, I wonder if I missed out on something equally subtle and important since I *didn't* get an Applied Math degree?

7. K? O'Rourke says:

Seeing the link to the Biometrika: One Hundred Years by Mike Titterington and David Cox reminding me of David's comment about Pearson after his work on this book – roughly my opinion of Pearson greatly improved especially as to the breadth and depth of his work and thoughtfullness. Very hard to believe that he missed the degrees of freedom thing – but he did.

Sometimes really smart people get "blind-sided"

K?

8. Scot says:

Gottman has published for the distressed couple, about a half dozen self-help books:

It's also interesting to note the nature of the predictors used such as arousal level during arguments and recollections of couple history. How active a spousal partner's autonomic nervous system is during an argument is a proxy for a whole lot more than current metabolic functioning. Another bad sign is whether or not a spousal partner can recall events of the wedding day. Just as with measuring skin conductance during an argument is a proxy, so is inability to recall events on one's wedding day a proxy for many other events, affect, and cognition that have transpired. There's a deeper, personally historical meaning to these predictors that are collected cross-sectionally. Like tree rings, maybe.
Scot

9. RogerH says:

Here's another apparent example of overfitting and lack of model validation:

Test to predict menopause age a step nearer – BBC News. "On average, the difference between the predicted and actual age of menopause was only a third of a year, with a maximum margin of error of three to four years."

Source appears to be this conference press release

To quote:

"By taking blood samples from 266 women, aged 20-49, who had been enrolled in the much larger Tehran Lipid and Glucose Study, Dr Ramezani Tehrani and her colleagues were able to measure the concentrations of a hormone that is produced by cells in women’s ovaries – anti-Mullerian Hormone (AMH)… 63 women who reached menopause during the study… Dr Ramezani Tehrani was able to use the statistical model to identify AMH levels at different ages that would predict if women were likely to have an early menopause (before the age of 45). She found that, for instance, AMH levels of 4.1 ng/ml or less predicted early menopause in 20-year-olds, AMH levels of 3.3 ng/ml predicted it in 25-year-olds, and AMH levels of 2.4 ng/ml predicted it in 30-year-olds. In contrast, AMH levels of at least 4.5 ng/ml at the age of 20, 3.8 ngl/ml at 25 and 2.9 ng/ml at 30 all predicted an age at menopause of over 50 years old."

Hmm, 266 subjects, 63 of whom experienced the event of interest, around 10 thresholds estimated. Any danger of overfitting do you think?

To be fair, the press release does end,"Considering that this is a small study that has looked at women over a period of time, larger studies starting with women in their twenties and following them for several years are needed to validate the accuracy of serum AMH concentration for the prediction of menopause in young women."

Still, the media coverage seems premature at best.