Kevin Lewis points us to this article by Ryan Enos, Anthony Fowler, and Christopher Havasy, who write:

This article examines the negative effect fallacy, a flawed statistical argument first utilized by the Warren Court in Elkins v. United States. The Court argued that empirical evidence could not determine whether the exclusionary rule prevents future illegal searches and seizures because “it is never easy to prove a negative,” inappropriately conflating the philosophical and arithmetic definitions of the word negative. Subsequently, the Court has repeated this mistake in other domains, including free speech, voting rights, and campaign finance. The fallacy has also proliferated into the federal circuit and district court levels. Narrowly, our investigation aims to eradicate the use of the negative effect fallacy in federal courts. More broadly, we highlight several challenges and concerns with the increasing use of statistical reasoning in court decisions. As courts continue to evaluate statistical and empirical questions, we recommend that they evaluate the evidence on its own merit rather than relying on convenient arguments embedded in precedent.

Damn. I thought I’d never say this, but Earl Warren knows about as much about statistics as the editors of Perspectives on Psychological Science and the Proceedings of the National Academy of Sciences.

I’d hope for better from the U.S. Supreme Court. Seriously.

Lots of psychology professors, including editors at top journals, don’t know statistics but feel the need to pretend that they do. But nobody expects a judge to know statistics. So one would think this would enable these judges to feel free to contact the nation’s top statistical experts for advice as needed. No need to try to wing it, right?

As Mark Twain might have said, it’s not what you don’t know that kills you, it’s what you know for sure that ain’t true.

Welcome to my world. The “definitive” statement about statistics in the Supreme Court is Casteneda (1977) http://caselaw.findlaw.com/us-supreme-court/430/482.html#496 , where footnote 17 says “As a general rule for such large samples, if the difference between the expected value and the observed number is greater than two or three standard deviation, then the hypothesis that the jury drawing was random would be suspect to a social scientist.” I’ve always appreciated this quote, which suggests that 0.05 and 0.01 as p values are all about the same to the Supreme Court, as they ought to be to most of us, right?

Having worked with many lawyers in mgmt consulting, I can make the blanket statement that they are as innumerate as anybody on the planet. Why would SCOTUS be any different? A big part of forensic statistics is educational in the sense of effectively communicating statistical literacy.

Thomas:

Sure, I have no problem with the idea that a bunch of middle-aged and elderly judges would be clueless about statistics, a field where awareness about key ideas has been rapidly changing in recent years. I do have a problem with these judges not recognizing they could use expert help on the matter, considering that given their positions they’re in excellent position to get high-quality expert advice whenever they want it.

Maybe the courts have been a little too reliant upon the musings of (the wrong) experts and their textbooks. Consider this quote from a recent US Supreme Court death penalty case:

“Once we know the SEM for a particular test and a particular test-taker, adding one SEM to and subtracting one SEM from the obtained score establishes an interval of scores known as the 66% confidence interval. See AAMR 10th ed. 57. That interval represents the range of scores within which “we are [66%] sure” that the “true” IQ falls. See Oxford Handbook of Child Psychological Assessment 291 (D. Saklofske, C. Reynolds, & V. Schwean eds. 2013).”

Can you provide more examples? These are fascinating and frightening.

Sure. Over 400 opinions containing “statistically significant” or “confidence interval” have been handed down over the last 12 months so there’s lots to choose from (none of it good). For example, the Supreme Court of Nebraska, in a property tax case, held thusly earlier this year:

“The confidence interval measures how reliably the sold properties represent all of the other properties in the class or subclass.”

“A narrower range of confidence interval indicates a greater reliability of a statistical measure (e.g., the median).” – County of Douglas v. Nebraska Tax Equal & Rev. Comm.

Last month the U.S. 3rd Circuit Court of Appeals explained p-values like this:

” A “p-value” indicates the likelihood that the difference between the observed and the expected value (based on the null hypothesis) of a parameter occurs purely by chance.” – In Re: Zoloft MDL

Not surprising, but disturbing nonetheless. The ASA needs to follow up on its Statement on p-Values by organizing informative sessions for attorneys, judges (a special one for the Supreme Court?) and perhaps legislators as well. (Admittedly a tough job.)

Math on Trial: How Numbers Get Used and Abused is a good in-depth source of examples of innumeracy in the legal system.

Um, “SEM” means what?

I gather it’s not the Swedish Evangelical Mission, though.

Standard Error of the Mean, I think.

Standard Error of Measurement

I’m holding out hope for Structural Equation Model. Add one in, take one out, arrive at the truth.

In partial defense of lawyers — who are indeed innumerate, we joke about it amongst ourselves so often that the jokes are tired! — most of us consume highbrow journalism, in which no one ever says statistical reasoning is hard (or that it involves any philosophical issues), and the inverse probability fallacy of p values is repeated again and again and again as a simple fact. We only hear that the math itself is hard.

This is a good point. For too long statistics has gotten a pass, even been taught by statisticians in a way that is harmful. I remember encouraging my sister to take stats around her sophomore year in college. she did it in summer school at a reasonably good california city college. This was before laptops were widespread, but everyone had a graphing calculator (maybe around 1999,2000). She got a good grade, but it was because she was the only one in the class that refused to use a graphing calculator with pre-canned type stats buttons and actually learned how to interpret the symbols on the page into individual functions to evaluate (the cdf of the normal, the cdf of the t, the sample standard deviation, etc). The whole class rested on hand calculations. I had to apologize to her. I assumed they would at least learn to graph data in some way and draw some regression lines and interpret them, make predictions, etc… no. just one kind of NHST test after another.

I am already trying to inoculate my daughter against the statistics instruction she will probably receive in college in a couple of years.

Have you been following the recent brouhaha in California over dropping Algebra II as a required Cal State course? The proponent, former Dean of Berkeley School of Law, wrote the following in a commentary piece:

“In fact, popular college math courses like Statistics do not require intermediate algebra. Studies show that the very same students, whose futures are threatened by algebra policies, can pass a rigorous college-level statistics course without knowing intermediate algebra.”

It makes you wonder just how rigorous are these college-level stats courses. It may also serve as further evidence that lots of highly regarded lawyers whose eyes would cross three pages in to Student’s famous paper think that ignorance of mathematics is no impediment to properly interpreting statistical analyses.

Interesting that this was brought up by a Law school dean. I had of course heard about the dropping Algebra II idea, but hadn’t been following the subject carefully enough to get that insight.

I am not sure what’s in an Algebra II course these days, so I looked this up for california public schools (high school):

http://www.cde.ca.gov/ci/ma/cf/documents/mathfwalgebra2lmgjl.pdf

The third page discusses what is learned in Algebra II. Overall I think it’s useful stuff in the description (how well it’s taught and how well the insights reach the children… is going to be pretty variable), but the part about dividing polynomials with remainder I remember being taught and it was a nightmare of by-hand long division all over again. I have a sense that maybe it would be better to include some computer programming and let the computer do the calculations, so you could talk more about what the idea is behind dividing polynomials. Plus, then they’d get some experience in giving an algorithm for something in a formal language.

Extraordinarily useful, given some of the discussions on this blog. Note this also in the description:

“They identify appropriate types of functions to model a situation, adjust

parameters to improve the model, and compare models by analyzing appropriateness of fit and making

judgments about the domain over which a model is a good fit. Students see how the visual displays and

summary statistics learned in earlier grade levels relate to different types of data and to probability

distributions. They identify different ways of collecting data—including sample surveys, experiments,

and simulations—and the role of randomness and careful design in the conclusions that can be drawn.”

The claim that students can succeed in a rigorous statistics class without needing to know the sort of things taught in Algebra II referenced in the paragraph above would appear … suspect.

I’m uneasy about the suggestion to drop Algebra II but I don’t think this is a fair criticism. The claim is not that students can pass a rigorous statistics class without ever learning any of the Algebra II curriculum; the claim is that students can pass a rigorous statistics class without first passing Algebra II. It could be that the parts of Algebra II that trip students up are irrelevant to stats, and the parts that are relevant they can learn just as well (or better!) in the stats course itself.

crh: It also might well be the case that the people who pass Algebra II don’t ever actually really learn the stuff in Algebra II and that they can pass Statistics because the average statistics course is even more formulaic and can be learned by rote in the same way.

Taking a class, even passing a class, even getting a good grade in a class is not the same as knowing the stuff you were supposed to have known in a way that actually carries forward.

Kyle: A good point. So outreach to journalists (especially the “highbrow” ones) is also needed. Again, undoubtedly a tough job.

It’s far worse than you imagine. The Reference Manual on Scientific Evidence (for use by federal judges) discusses statistics at length and while it largely gets the definitions and basics right it promptly fails once it attempts to apply the lessons. For example, immediately after stating more or less correctly (so far as I, a lawyer, can tell) the definition of a p-value it concludes “if p is very small, something other than chance must be involved”.

Then, in discussing NHST, it explains the “logic” of it. It is you see proof by contradiction. If H0 is “X doesn’t cause Y” and NHST knocks over H0 then X must cause Y. Isn’t science cool?

You should read some of the opinions decided by a court’s interpretation of confidence intervals. You’d probably conclude that they’d do more justice if they interpreted tea leaves or chicken entrails instead.

To be fair, those seem like ho hum standard issue (mis-)interpretations. If one decides to talk about NHST, p-values, confidence intervals, and the like, aren’t you then forced to give its “logic”? The fact that the logic is… controversial… is a somewhat different issue.

That is, this seems to me like judges getting the conventional stats training and not anything much worse than that. Now the question is whose fault is it that this rates as conventional stats training?

This supports what I often tell students: One of the most important mistakes in using statistics is expecting too much certainty. “If it involves statistical inference then it involves uncertainty.” And a lot of people are uncertainty-avoidant. One of many relevant quotes:

“Humans may crave absolute certainty; they may aspire to it;

they may pretend … to have attained it. But the history of

science … teaches that the most we can hope for is successive

improvement in our understanding, learning from our

mistakes, … but with the proviso that absolute certainty will

always elude us.”

Carl Sagan, The Demon-Haunted World: Science as a Candle in the Dark (1995), p. 28

I think this is a little uncharitable to the chapter on statistics, which can be read here:

https://www.nap.edu/read/13163/chapter/7#250

Of course there are objections to made to the NHST framework, but I think the chapter gets the logic of it correct. On p. 257, e.g., the authors state: “rejection of the null hypothesis does not leave the proffered alternative hypothesis as the only viable explanation for the data.”

The obvious question is: “So why not try to reject the “alternative hypothesis” instead, which is seemingly the one we are actually concerned with”? What does the null hypothesis have to do with anything?

Would we need a numerate congress to include numeracy in the vetting process? I’m not schooled enough in political science theory to understand the vetting equations. Can the latent feature of numeracy survive the over-fitting of the vetting process when the vettors lack the lab equipment to include numeracy as a feature? Do we assume that numeracy is latent in the survival bias of the previous career?

Ted:

I don’t ask that Supreme Court judges be numerate, any more than I would demand that of Princeton psychology professors. I only would hope that these decision makers and opinion leaders would recognize the bounds of their own expertise and bring in experts as needed.

Andrew, one assumes that in fact they do, the problem is one person’s “expert” is another person’s PPNAS publication.

Daniel:

Sure, there are a lot of pseuds out there pretending to be experts. But my guess is that these people don’t even take the step of asking

anyexperts. If they did, they’d at least have a chance to get things straightened out.I think courts do consult experts in various ways. But there’s a point where the expert input stops and a decision is written. Draft decisions don’t get vetted—it’s the judge and his/her clerks on their own trying to make sense of it all. Same issue as with journalism.

The SCOTUS decision on whether genes can be patented is full of the same manner of nonsense. Lots of sentences are literally correct. But the decision is not logically consistent with itself in a way that shows they don’t know the biology and/or don’t care. They had several amicus briefs that carefully laid out the science at hand, and I personally know some of the clerks had conversations with geneticists.

But it’s not the least surprising. Logical inconsistencies are not unusual for SCOTUS decisions that weave together at least 5 strong-minded opinions into something like a consensus. When that decision process has to work over a complicated domain like stats, biology, human behavior, economics, etc., for whom almost everyone in the room will have had a few undergrad classes worth of background *at the absolute best*, what really could we expect?

Experts are often brought in: expert witnesses.

I don’t know that they make things any better; my expertise may not be particularly helpful in determining the truth when I’m paid by one side of the argument, regardless of the truth. Not too surprisingly, I think this exactly the same pressure felt by statistical consultants for academic projects.

I was once asked by a Federal judge to assist her in sorting out the expert testimony of two competing econometric experts in a case. It was a great experience and I agree with you that it’s surprising it isn’t done more often.