Skip to content

Gigerenzer on logical rationality vs. ecological rationality

I sent my post about the political implication of behavioral economics, embodied cognition, etc., to Gerd Gigerenzer, who commented as follows:

The “half-empty” versus “half-full” explanation of the differences between Kahneman and us misses the essential point: the difference is about the nature of the glass of rationality, not the level of the water. For Kahneman, rationality is logical rationality, defined as some content-free law of logic or probability; for us, it is ecological rationality, loosely speaking, the match between a heuristic and its environment. For ecological rationality, taking into account contextual cues (the environment) is the very essence of rationality, for Kahneman it is a deviation from a logical norm and thus, a deviation from rationality. In Kahneman’s philosophy, simple heuristics could never predict better than rational models; in our research we have shown systematic less-is-more effects.

Gigerenzer pointed to his paper with Henry Brighton, “Homo Heuristicus: Why Biased Minds Make Better Inferences,” and then he continued:

Please also note that Kahneman and his followers accept rational choice theory as the norm for behavior, and so does almost all of behavioral economics. They put the blame on people, not the model.

This makes sense, in particular the less-is-more idea seems like a good framing.

That said, I think some of the power of Kahneman and Tversky’s cognitive illusions, as with the visual illusions with which we are all familiar, is that there often is a shock of recognition, when we realize that our intuitive, “heuristic,” response is revealed, upon deeper reflection, to be incorrect.

To put it in Gigerenzer’s framework, our environment is constantly changing, and we spend much of our time in an environment that is much different than the savanna where our ancestors spent so many thousands of years.

From this perspective, rational choice is not an absolute baseline of correctness but in many ways it works well in our modern society which includes written records, liquid and storable money, and various other features for which rationality is well adapted.

Perhaps the most contextless email I’ve ever received

Date: February 3, 2015 at 12:55:59 PM EST
Subject: Sample Stats Question
From: ** <**>

I hope all is well and trust that you are having a great day so far. I hate to bother you but I have a stats question that I need help with: How can you tell which group has the best readers when they have the following information: Group A-130, 140, 170,170, 190, 200, 215, 225, 240, 250
Group B- 188, 189, 193, 193, 193, 194, 194, 195, 195, 196
Group A-mean (193), median (195), mode (170)
Group B- mean (193), median(193.5), mode (193)
This is for my own personal use and understanding of this subkject matter so anything you could say and redirect me would be greatly appreicated.
Any feedback that you could give me to help understand this better would be greatly appreciated.

Item-response and ideal point models

To continue from today’s class, here’s what we’ll be discussing next time:

- Estimating the direction and the magnitude of the discrimination parameters.

- How to tell when your data don’t fit the model.

- When does ideal-point modeling make a difference? Comparing ideal-point estimates to simple averages of survey responses.

P.S. Unlike the previous post, this time I really am referring to the class we had this morning.

A message I just sent to my class

I wanted to add some context to what we talked about in class today. Part of the message I was sending was that there are some stupid things that get published and you should be careful about that: don’t necessarily believe something, just cos it’s statistically significant and published in a top journal.

And, sure, that’s true, I’ve seen lots of examples of bad studies that get tons of publicity. But that shouldn’t really be the #1 point you get from my class.

This is how I want you to think about today’s class:

Consider 3 different ways in which you will be using sample surveys:

1. Conducting your own survey;

2. Performing your own analysis of existing survey data;

3. Reading and interpreting a study that was performed by others.

The key statistical message of today’s lecture was that if the underlying comparison of interest in the population (what I was calling the “effect size,” but that is somewhat misleading, as we could be talking about purely descriptive comparisons with no direct causal interpretation) is small, and if measurements are poor (high bias, high variance, or both), then it can be essentially impossible to learn anything statistical from your data.

The point of the examples I discussed is not so much that they’re dumb, but that they are settings where the underlying difference or effect in the population is small, and where measurements are noisy, or biased, or both.

What does this imply for your own work? Consider the 3 scenarios listed above:

1. If you’re conducting your own survey: Be aware of what your goal is, what you’re trying to estimate. And put lots of effort into getting valid and reliable measurements. If you’re estimating a difference which in truth is tiny, or if your measurements are crap, you’re drawing dead (as they say in poker).

2. If you’re performing your own analysis of existing survey data: Same thing. Consider what you’re estimating and how well it’s being measured. Don’t fall into the trap of thinking that something that’s statistically significant is likely to accurately represent a truth in the general population.

3. If you’re reading and interpreting a study that was performed by others: Same thing. Even if the claim does not seem foolish, think about the size of the underlying comparison or effect and how accurately it’s being estimated.

To put it another way, one thing I’m pushing against is the attitude that statistical significance is a “win.” From that perspective, it’s ok to do a noisy study of a small effect if the cost is low, because you might get lucky and get that “p less than .05.” But that is a bad attitude, because if you’re really studying a small effect with a noisy measurement, anything that happens to be statistically significant could well be in the wrong direction and is certain to be an overestimate. In the long run, finding something statistically significant in this way is not a win at all, it’s a loss in that it can waste your time and other researchers’ time.

This is all some serious stuff to think about in a methods class, but it’s important to think a bit about the endgame.

P.S. (in case this is confusing anyone who was in class today): I wrote the above message a couple months ago. Most of the posts on this blog are on delay.

“For better or for worse, academics are fascinated by academic rankings . . .”

I was asked to comment on a forthcoming article, “Statistical Modeling of Citation Exchange Among Statistics Journals,” by Christiano Varin, Manuela Cattelan and David Firth.

Here’s what I wrote:

For better or for worse, academics are fascinated by academic rankings, perhaps because most of us reached our present positions through a series of tournaments, starting with course grades and standardized tests and moving through struggles for the limited resource of publication space in top journals, peer-reviewed grant funding, and finally, the unpredictable process of citation and reputation. As statisticians we are acutely aware of the failings of each step of the process and we find ourselves torn between the desire to scrap the whole system, Arxiv-style, or to reform it as suggested in the present paper. In this article, Varin, Catelan, and Firth argue that quantitative assessment of scientific and scholarly publication is here to stay, so we might as well try to reduce the bias and variance of such assessments as much as possible.

As the above paragraph indicates, I have mixed feelings about this sort of effort and as a result I feel too paralyzed to offer any serious comments on the modeling. Instead I will offer some generic, but I hope still useful, graphics advice: Table 2 is essentially unreadable to me and is a (negative) demonstration of the principle that, just as we should publish include any sentences that we do not want to be read, we also should avoid publishing numbers that will not be of any use to a reader. Does anyone care, for example, that AoS has exactly 1663 citations? This sort of table cries out to be replaced by a graph (which it should be possible to construct taking up no more space than the original table; see Gelman, Pasarica, and Dodhia, 2002). Figure 1 violates a fundamental principle of graphics in that it wastes one of its axes, in that it follows what Wainer (2001) has called the Alabama first ordering. Figure 2 has most of its words upside down, which is a result of an unfortunate choice to present a vertical display as horizontal, thus requiring me to rotate my computer 90 degrees to read it. Table 4 represents one of the more important outputs of the research being discussed, but it too is hard to read, requiring me to try to track different acronyms across the page. It would be so natural to display these results as a plot with one line per journal.

I will stop at this point and conclude by recognizing that these comments are trivial compared to the importance of the subject, but as noted above I was too torn by this topic offer anything more.

And here are X’s reactions.

Why do we communicate probability calculations so poorly, even when we know how to do it better?

Haynes Goddard writes:

I thought to do some reading in psychology on why Bayesian probability seems so counterintuitive, and making it difficult for many to learn and apply. Indeed, that is the finding of considerable research in psychology. It turns out that it is counterintuitive because of the way it is presented, following no doubt the way the textbooks are written. The theorem is usually expressed first with probabilities instead of frequencies, or “natural numbers” – counts in the binomial case.

The literature is considerable, starting at least with a seminal piece by David Eddy (1982). “Probabilistic reasoning in clinical medicine: problems and opportunities,” in Judgment under Uncertainty: Heuristics and Biases, eds D. Kahneman, P. Slovic and A. Tversky. Also much cited are Gigerenzer and Hoffrage (1995) “How to improve Bayesian reasoning without instruction: frequency formats” Psychol. Rev, and also Cosmides and Tooby, “Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty”, Cognition, 1996.

This literature has amply demonstrated that people actually can readily and accurately reason in Bayesian terms if the data are presented in frequency form, but have difficulty if the data are given as percentages or probabilities. Cosmides and Tooby argue that this is so for evolutionary reasons, and their argument seems compelling.

So taking a look at my several texts (not a random sample of course), including Andrew’s well written text, I wanted to know how many authors introduce the widely used Bayesian example of determining the posterior probability of breast cancer after a positive mammography in numerical frequency terms or counts first, then shifting to probabilities. None do, although some do provide an example in frequency terms later.

Assuming that my little convenience sample is somewhat representative, it raises the question of why are not the recommendations of the psychologists adopted.

This is a missed opportunity, as the psychological findings indicate that the frequency approach makes Bayesian logic instantly clear, making it easier to comprehend the theorem in probability terms.

Since those little medical inference problems are very compelling, it would make the lives of a lot of students a lot easier and increase acceptance of the approach. One can only imagine how much sooner the sometimes acrimonious debates between frequentists and Bayesians would have diminished if not ended. So there is a clear lesson here for instructors and textbook writers.

Here is an uncommonly clear presentation of the breast cancer example: And there are numerous comments from beginning statistics students noting this clarity.

My response:

I agree, and in a recent introductory course I prepared, I did what you recommend and started right away with frequencies, Gigerenzer-style.

Why has it taken us so long to do this? I dunno, force of habit, I guess? I am actually pretty proud of chapter 1 of BDA (especially in the 3rd edition with its new spell-checking example, but even all the way back to the 1st edition in 1995) in that we treat probability as a quantity that can be measured empirically, and we avoid what I see as the flaw of seeking a single foundational justification for probability. Probability is a mathematical model with many different applications, including frequencies, prediction, betting, etc. There’s no reason to think of any one of these applications as uniquely fundamental.

But, yeah, I agree it would be better to start with the frequency calculations: instead of “1% probability,” talk about 10 cases out of 1000, etc.

P.S. It’s funny that Goddard cited a paper by Cosmides and Tooby, as they’re coauthors on that notorious fat-arms-and-political-attitudes paper, a recent gem in the garden-of-forking-paths, power=.06 genre. Nobody’s perfect, I guess. In particular, it’s certainly possible for people to do good research on the teaching and understanding of statistics, even while being confused about some key statistical principles themselves. And even the legendary Kahneman has been known, on occasion, to overstate the strength of statistical evidence.

On deck this week

Mon: Why do we communicate probability calculations so poorly, even when we know how to do it better?

Tues: “For better or for worse, academics are fascinated by academic rankings . . .”

Wed: A message I just sent to my class

Thurs: Perhaps the most contextless email I’ve ever received

Fri: Gigerenzer on logical rationality vs. ecological rationality

Sat: How do data and experiments fit into a scientific research program?

Sun: Go to PredictWise for forecast probabilities of events in the news

I wrote some of these so long ago, I don’t even remember what they’re about. So I’m looking forward to these with as much anticipation as you are!

“Another bad chart for you to criticize”

Perhaps in response to my lament, “People used to send me ugly graphs, now I get these things,” Stuart Buck sends me an email with the above comment and a link to this “Graphic of the day” produced by some uncredited designer at Thomson Reuters:

Screen Shot 2014-10-09 at 8.48.21 AM

From a statistical perspective, this graph is a disaster in that the circular presentation destroys the two-way structure (countries x topics) which has to be central to any understanding of these data. In addition, to the extent that you’d want to get something out of the graph, you’ll end up having to perform mental divisions of line widths.

At this point I’d usually say something like: On the plus side, this is a thought-provoking display (given its tentacle-like appearance, one might even call it “grabby”) that draws viewers’ attention to the subject matter. But I can’t really even say that, because the subject of the graph—nationalities of Nobel Prize winners—is one of the more overexposed topics out there, and really the last thing we need is one more display of these numbers. Probably the only thing we need less of is further analysis of the Titanic survivors data. (Sorry, Bruno: 5 papers on that is enough!)

Another stylized fact bites the dust

According to economist Henry Farber (link from Dan Goldstein):

In a seminal paper, Camerer, Babcock, Loewenstein, and Thaler (1997) find that the wage elasticity of daily hours of work New York City (NYC) taxi drivers is negative and conclude that their labor supply behavior is consistent with target earning (having reference dependent preferences). I replicate and extend the CBLT analysis using data from all trips taken in all taxi cabs in NYC for the five years from 2009-2013. The overall pattern in my data is clear: drivers tend to respond positively to unanticipated as well as anticipated increases in earnings opportunities. This is consistent with the neoclassical optimizing model of labor supply and does not support the reference dependent preferences model.

I explore heterogeneity across drivers in their labor supply elasticities and consider whether new drivers differ from more experienced drivers in their behavior. I find substantial heterogeneity across drivers in their elasticities, but the estimated elasticities are generally positive and only rarely substantially negative. I also find that new drivers with smaller elasticities are more likely to exit the industry while drivers who remain learn quickly to be better optimizers (have positive labor supply elasticities that grow with experience).

It’s good to get that one out of the way.

A silly little error, of the sort that I make every day


Ummmm, running Stan, testing out a new method we have that applies EP-like ideas to perform inference with aggregate data—it’s really cool, I’ll post more on it once we’ve tried everything out and have a paper that’s in better shape—anyway, I’m starting with a normal example, a varying-intercept, varying-slope model where the intercepts have population mean 50 and sd 10, and the slopes have population mean -2 and sd 0.5 (for simplicity I’ve set up the model with intercepts and slopes independent), and the data variance is 5. Fit the model in Stan (along with other stuff, the real action here’s in the generated quantities block but that’s a story for another day), here’s what we get:

            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mu_a[1]    49.19    0.01 0.52 48.14 48.85 49.20 49.53 50.20  2000    1
mu_a[2]    -2.03    0.00 0.11 -2.23 -2.10 -2.03 -1.96 -1.82  1060    1
sigma_a[1]  2.64    0.02 0.50  1.70  2.31  2.62  2.96  3.73   927    1
sigma_a[2]  0.67    0.00 0.08  0.52  0.61  0.66  0.72  0.85   890    1
sigma_y     4.97    0.00 0.15  4.69  4.86  4.96  5.06  5.27  2000    1

We’re gonna clean up this output—all these quantities are ridiculous, also I’m starting to think we shouldn’t be foregrounding the mean and sd as these can be unstable; median and IQR would be better, maybe—but that’s another story too.

Here’s the point. I looked at the above output and noticed that the sigma_a parameters are off: the sd of the intercept is too low (it’s around 2 and it should be 10) and the sd of the slopes is too high (it’s around 0.6 and it should be 0.5). The correct values aren’t even in the 95% intervals.

OK, it could just be this one bad simulation, so I re-ran the code a few times. Same results. Not exactly, but the parameter for the intercepts was consistently underestimated and the parameter for the slopes was consistently overestimated.

What up? OK, I do have a flat prior on all these hypers, so this must be what’s going on: there’s something about the data where intercepts and slopes trade off, and somehow the flat prior allows inferences to go deep into some zone of parameter space where this is possible.

Interesting, maybe ultimately not too surprising. We do know that flat priors cause problems, and here we are again.

What to do? I’d like something weakly informative, this prior shouldn’t boss the inferences around but it should keep them away from bad places.

Hmmm . . . I like that analogy: the weakly informative prior (or, more generally, model) as a permissive but safe parent who lets the kids run around in the neighborhood but sets up a large potential-energy barrier to keep them away from the freeway.

Anyway, to return to our story . . . I needed to figure out what was going on. So I decided to start with a strong prior focused on the true parameter values. I just hard-coded it into the Stan program, setting normal priors for mu_a[1] and mu_a[2]. But then I realized, no, that’s not right, the problem is with sigma_a[1] and sigma_a[2]. Maybe put in lognormals?

And then it hit me: in my R simulation, I’d used sd rather than variance. Here’s the offending code:

a <- mvrnorm(J, mu_a, diag(sigma_a))

That should've been diag(sigma_a^2). Damn! Going from univariate to multivariate normal, the notation changed.

On the plus side, there was nothing wrong with my Stan code. Here's what happens after I fixed the testing code in R:

            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mu_a[1]    48.17    0.11 1.62 45.08 47.07 48.12 49.23 51.38   211 1.02
mu_a[2]    -2.03    0.00 0.10 -2.22 -2.09 -2.02 -1.97 -1.82  1017 1.00
sigma_a[1] 10.98    0.05 1.18  8.95 10.17 10.87 11.68 13.55   496 1.01
sigma_a[2]  0.57    0.00 0.09  0.42  0.51  0.56  0.63  0.75   826 1.00
sigma_y     5.06    0.00 0.15  4.78  4.95  5.05  5.16  5.35  2000 1.00

Fake-data checking. That's what it's all about.


And that's why I get so angry at bottom-feeders like Richard Tol, David Brooks, Mark Hauser, Karl Weick, and the like. Every damn day I'm out here working, making mistakes, and tracking them down. I'm not complaining; I like my job. I like it a lot. But it really is work, it's hard work some time. So to encounter people who just don't seem to care, who just don't give a poop whether the things they say are right or wrong, ooohhhhh, that just burns me up.

There's nothing I hate more than those head-in-the-clouds bastards who feel deep in their bones that they're right. Whether it's an economist fudging his numbers, or a newspaper columnist lying about the price of a meal at Red Lobster, or a primatologist who won't share his videotapes, or a b-school professor who twists his stories to suit his audience---I just can't stand it, and what I really can't stand is that it doesn't even seem to matter to them when people point out their errors. Especially horrible when they're scientists or journalists, people who are paid to home in on the truth and have the public trust to do that.

A standard slam against profs like me is that we live in an ivory tower, and indeed my day-to-day life is far removed from the sort of Mametian reality, that give-and-take of fleshy wants and needs, that we associate with "real life." But, y'know, a true scholar cares about the details. Take care of the pennies and all that.