Statistical discrimination again

Mark Johnstone writes:

I’ve recently been investigating a new European Court of Justice ruling on insurance calculations (on behalf of MoneySuperMarket) and I found something related to statistics that caught my attention. . . . The ruling (which comes into effect in December 2012) states that insurers in Europe can no longer provide different premiums based on gender. Despite the fact that women are statistically safer drivers, unless it’s biologically proven there is a causal relationship between being female and being a safer driver, this is now seen as an act of discrimination (more on this from the Wall Street Journal).

However, where do you stop with this? What about age? What about other factors? And what does this mean for the application of statistics in general? Is it inherently unjust in this context?

One proposal has been to fit ‘black boxes’ into cars so more individual data can be collected, as opposed to relying heavily on aggregates.

For fans of data and statistics, the law poses some interesting challenges. And I’d love to see somebody digging into this further from a statistical point-of-view.

I don’t have much to add here, beyond the usual Bayesian point that, if we have enough data on individuals, this will be more important than average rates, and also the usual political point that good information might not get used if the rulemakers have particular sympathy for unsafe drivers.

28 thoughts on “Statistical discrimination again

  1. I’m surprised Gelman didn’t raise an issue I recall he raised in an earlier post (but I can’t remember the exact post)where he questioned trying to show there is a causal relationship between things like gender and some property (because of the inability to intervene just on gender). But I may be misremembering it.

    • Mayo,

      Indeed, I don’t think the expression “it’s biologically proven there is a causal relationship between being female and being a safer driver” is clean enough to have any useful meaning here. But in this case the relevant issues seemed to be predictive rather than causal so I didn’t see any reason to get into a discussion about causality.

      • I think causality is the issue. From a statistical point-of-view insurance companies use GLMs to predict losses. The old rule was that you could ‘discriminate’ based on gender if you could show actuarial data and a model backing up the pricing with a parameter for a gender variable. So you couldn’t just jack premiums up for men or women because you thought you could get away with it.

        The new rule is basically that you can’t put gender in as a independent variable – for predictive models – unless you can show gender is a cause, rather than just proxying for lurking variables. So if gender does cause something it’s okay to price on it (men are testosterone crazed so they drive fast), if it just contributes to a prediction that’s illegal (fast drivers tend to be men). I suppose, given the difficulty of proving causation, this is an effective ban on the use of gender at all – except maybe for things like prostate cancer.

        Obviously, given an important variable has been banned, insurers’ models are now weaker, and they’re casting around for other variables they can throw in to reduce predictive error. That includes trying to measure things gender was proxying before and also trying to figure out other variables that’ll surreptitiously proxy gender (like using black boxes to record driving miles). I don’t know what the net effect of all that will be; though obviously with more uncertain predictions premiums will be higher as a larger margin is needed as a buffer.

        • “men are testosterone crazed so they drive fast”

          So isn’t then “gender” a proxy for the “testosterone” lurking variable. After all some men have low testosterone and some women have high testosterone. In fact, can you name any variable that insurance companies take into consideration that isn’t a proxy for some lurking variable?

        • I wonder how much effort insurance companies will expend improving the causal resolution of their models. These models are deployed in support of insurance companies overriding concern: to pay as little in claims as humanly possible. If a variable has a tenuous connection to a causal relationship but is effective as a predictor in reducing losses , the insurances companies clearly are going to use it, unless constrained by government as in the example above. I am guessing the industries interest in improving the causal resolution of their models would directly correlate with reducing claim payouts.
          How does this ruling define causation and what are the metrics? I wonder, given the difficult nature of even defining causation, perhaps it would be easier to employ data mining to get around the gender restriction. Lots of consumer data lying around….

  2. There is a simple answer to the question; we now know how to find statistical causation, instead of correlation. (Judea Pearl’s Causality seems like the standard new work.) Why isn’t this a sufficient form of proof to use? The real question is not whether gender is biologically proven to influence driving behavior, but whether it is logically proven – a much stronger standard. The problem with the debate is that people seem to equate logical proof to “correlation = causality,” which was true until we had a formal and clear system to show the difference.

  3. “Where do you stop with not discriminating?” is a good question. But then, so is “where do you stop with discriminating?”

  4. Having met innumerable men and women in my life, I can testify with complete certainty that the observed affect is causally related to gender. Sometimes common sense is based on stronger assumptions than any statistical test.

    It’s as though the Europeans are so embarrassed by that whole messy episode back in the ’40s, that they want to make up for it by eliminating discrimination against ….. males in car insurance pricing.

    • The idea is related to the idea of “data protection” (Datenschutz), which is a constitutional right in Germany, from whence it spread to the rest of the Union, cf. the debated “right to be forgotten”. Although this ties in with the experience under the totalitarian regimes of the 30es, the idea is far older. In French jurisprudence, we find the idea of “le droit à l’oubli” – a criminal, who has served his term and been rehabilitated, can protest against the public disclosure of his crime.

      Legislating against discrimination is hard, but this isn’t (just) about discrimination, it is also how Germany (and subsequently the EU) conceives the fundamental human rights.

      • Insurance companies didn’t cause the commotion in the ’30s and ’40s: it was governments. Nor is it the “right to be forgotten” by insurance companies that will ensure freedom and human rights; it’s the “right to be forgotten” by the government that counts. And on that score the governments of Europe have failed miserably since they’ve taken it upon themselves to micromanage every single human interaction. The nonsense Gelman highlighted is one example, but there are many others:

        http://www.huffingtonpost.co.uk/2012/10/18/rowan-atkinson-launches-westminster-free-speech-campaign-insult-law_n_1977488.html

        • The cases highlighted by Atkinson has nothing to do with the EU framework. Indeed, the provisions of the Public Order of 1986 are limited by the European Convention on Human Rights (in casu art. 11).

          I didn’t state that insurance companies caused any “commotion”! Indeed, I would agree that it is government, not business, that needs to be disinfected by sunlight. However, my point was merely that the issue at hand is not solely anti-discrimination legislation, but part of a greater frame-work, based on a certain conception of human rights.

          Why on Earth would I want to start an insurance company?! And of course that same company wouldn’t be competetive in a market of firms not constrained in this way – much as a factory using NOx-filtering wouldn’t be competetive in a market that doesn’t penalize negative externalities. Do you think that provides an argument against environmental legislation?

        • I never mentioned the EU.

          The case of car insurance doesn’t involve negative externalities since all costs are born by the contracting parties. The only exception to this is the price society pays for the moral hazard generated by those insured. And on that point, charging men and women the same insurance rates would tend to increase the problem of moral hazard.

      • Also, I might add that Gelman is complete right on this. “Causality” is at best irrelevant and worst a rabbit whole that leads to a lot of wasted time and further nonsense. “Prediction” is what’s relevant, and gender is very obviously a strong predictor of accidents (if you disagree you’re free to start an insurance company that charges the same price to males and females).

        The fact that the same governments who’ve taken it upon themselves to micromanage every human interaction, lack the common sense needed to see this is not a good sign.

  5. 1) What is the distinction between biological and statistical proof? Biologists use statistical methodology to draw conclusions about biological questions all the time. It’s not clear that there is a difference between the two types of “proof” at all, and it’s certainly not the case that a crisp line can or should be drawn between them.

    2) One common justification for imprisoning criminals is that they are deemed more likely to commit crime in future. But this is surely not proven in any other than a statistical sense – is this a case of unfair discrimination against criminals?

  6. While I am sympathetic with the tone of the post, if the result was racial (say it was that Asians were shown statistically to be less safe drivers, or hispanics, or if whites were shown to be more safe drivers), I think we would all agree that the insurance premiums shouldn’t be cheaper for whites on the basis of their race. Or am I off-base? I just think we can’t treat all potentially useful statistical results as equally relevant to regulation and commerce; maybe the line is in the wrong place but there *is* a line, no?

  7. SRY -> testes -> testosterone -> risky behavior -> more crashes seems pretty plausible to me, but since some of those arrows can be fairly weak (especially compared to the first one!), I don’t really have a problem with requiring the insurance companies to target the risky behavior itself as a basis for premium raises. How is it fair for prudent men to be lumped in with their reckless counterparts just because both have the same gonads?

  8. I’d think that Gelman would raise here an example he likes a lot, namely, that professors don’t use info about student’s past performance when grading them because it seems unfair.

  9. The big question is whether “fairness” (however defined) is the relevant issue here.

    There are other possible goals. For example:

    A) The insurance companies want to maximize profits

    B) I don’t want to be killed by a bad driver.

    It would seem like me and the insurance companies have a mutual interest in their setting rates based on their actuarial evidence of what kind of people tend to be bad drivers.

    For example, if the insurance companies’ records suggest that, say, 16-year-old boy drivers tend to kill twice as many people as 16-year-old girl drivers, and therefore set premiums higher for 16-year-old boys as 16-year-old girls, and this discourages some boys from driving at 16, especially boys from poorer families, is this really so bad?

    Or might this be considered a net socially beneficial outcome?

    • Steve:

      I’m with ya on that, but I don’t know that lawmakers agree. See the last sentence of my post above. My guess is that the lawyers and judges who make the rules have a lot more personal experience and sympathy with dangerous drivers than with victims of dangerous drivers.

      • They should allow vehicular insurance companies to raise your rates if you are a politician and your surname begins with a K and ends with a Y:

        From 2009:

        “It wasn’t alcohol, said Rep. Patrick Kennedy, in the first of two statements issued to explain an accident in which he crashed his car into a security barricade near the U.S. Capitol.

        “I consumed no alcohol prior to the incident,” said Kennedy, commenting on reports that he appeared to be staggering when he emerged from his green Mustang convertible at about 2:45 a.m. Thursday – insisting that he was late for a vote in the House.

        That statement left many reporters with further questions about the incident and the way it was handled by police. Late Thursday night, the 38-year-old Rhode Island Democrat – son of Massachusetts Democratic Sen. Edward Kennedy – decided to release more details.

        Patrick Kennedy says several hours before the accident, he had taken two prescription drugs prescribed by the attending physician of the U.S. Congress: Phenergan, to treat gastroenteritis, and Ambien, a sleeping pill.

        “Following the last series of votes on Wednesday evening, I returned to my home on Capitol Hill and took the prescribed amount of Phenergan and Ambien,” said Kennedy. “Some time around 2:45 a.m., I drove the few blocks to the Capitol Complex believing I needed to vote. Apparently, I was disoriented from the medication.”

        http://www.cbsnews.com/2100-250_162-1590041.html

        • Or from this July:

          Kerry Kennedy, the ex-wife of New York Governor Andrew Cuomo, was arrested and charged with driving while impaired after she collided with a tractor trailer and left the scene of the accident, police said.
          Police later found Kennedy, who is the daughter of the late Robert F. Kennedy, parked off an interstate north of New York City and passed out behind the wheel of her Lexus on Friday.

          ABC News has learned that Kennedy told police she may have taken Ambien sometime Friday morning, but doesn’t remember for sure.

  10. Victims of bad drivers are, clearly, victims, but they are just random human beings who happened to be at the wrong place at the wrong time. Thus, they are not a Designated Victim Group, and lack political advocates, other than the occasional surviving loved ones.

    In the United States, males are also not a Designated Victim Group, so the topic of charging male drivers higher insurance rates seems like a jocular one. However, several of the commenters have noted that this principle might be applied to Designated Victim Groups, and we can’t have that, now can we?

    For example, younger Hispanic males have a long track record of a higher tendency toward drunk driving and lethal crashes (Google the late Angel pitcher Nick Adenhart for a classic example). It strikes me that it would, on the whole, be a good thing for insurance companies to make it relatively harder for young Hispanic males to drive, but Hispanics are a Designated Victim Group, so I suspect that this suggestion is right now triggering a Moral Gag Reflex in many readers, even ones as sophisticated as Bayesian statisticians.

  11. The interesting bit is that the new legislation concerns not only car insurance premiums, but also pricing of the annuities, with a significant difference in longevity between men and women, which I guess has biological reason (although, as far as I understand, there is no clear mechanism to explain it).

  12. Hi, from a UK based actuary. First point to note is that the ban is in fact complete: insurers may not charge differential prices based on gender, end of story, and this applies across all lines of business. There’s a debate about whether one can load rates for individuals (Do you have testicles? +1% for testicular cancer risk, etc).

    Although the ruling is widely unpopular, it raises interesting issues. After, as other posters have mentioned, different races do have different susceptibility to diseases, and hence different mortality curves. Yet I think we’d all agree that charging white people less for insurance would be a non-starter.

Comments are closed.