Skip to content

Documenting a class-participation activity

Tian Zheng implemented my candies demo using Legos:


Also lots of details on the results. The point here is not exactly what happened (but, yes, the demo did work) but rather the idea that you can use photos and graphs to document what worked in class. We should be able to do this sort of scrapbooking all the time as teachers.

Next time: take some photos of the kids in class doing the activities, too (assuming that’s ok with them).

Economics/sociology phrase book

Mark Palko points me to this amusing document from Jeffrey Smith and Kermit Daniel, translating sociology jargon into economics and vice-versa. Lots of good jokes there.

Along these lines, I’ve always been bothered by economists’ phrase “willingness to pay” which, in practice, often means “ability to pay.”

And, of course, “earnings” which means “how much money you make.”

But, to be fair, statisticians have some of these too. For example, in psychometrics we use the term “ability” to refer to the very specific ability to get certain questions correct on a test.

Cognitive vs. behavioral in psychology, economics, and political science

I’ve been coming across these issues from several different directions lately, and I wanted to get the basic idea down without killing myself in the writing of it. So consider this a sketchy first draft.

The starting point is “behavioral economics,” also known as the “heuristics and biases” subfield of cognitive psychology. It’s associated with various studies of cognitive illusions, settings where people systematically mispredict uncertain events or make decisions. Within psychology, this work is generally accepted but with some controversy which could be summed up in the phrase, “Kahneman versus Gigerenzer,” but it’s my impression that in recent years there’s been a bit of a convergence: for Kahneman the glass is half-empty and for Gigerenzer the glass is half-full, but whether you’re talking about “heuristics and biases” or “fast and frugal decision making,” there’s been a focus on understanding how our brains use contextual cues to decide how to solve a problem.

In economics, this work is more disputed because it seems to be in head-on conflict with models of utility-maximizing rationality from the 1930s-50s associated with the theories of Neumann and others on economic decision making. While some economists have embraced so-called “behavioral” ideas to explain imperfect markets, other economists are (a) skeptical about the relevance to real-world high-stakes behavior of laboratory findings on cognitive illusions and (b) wary of the political implications of social engineers who want to use cognitive biases to “nudge” people toward behavior they otherwise wouldn’t have done.

Within economics, I’d say that the behavioral/classical debate roughly follows left/right lines: on the left are the behaviorists who say that individuals and firms are irrational and thus we should not trust the judgment of the markets, instead we should regulate and protect people from their irrationality. On the right are the classicists who hold that people are rational when it comes to real economic decisions and thus any interference in the market, whether from governments or labor unions, will tend to make things worse.

The conservative position has some difficulties when dealing with customs and culture and roles and various non-governmental constraints on economic behavior: from one sort of conservative perspective these are unnecessary restrictions on the economy, silly traditions that rule-breaking entrepreneurs will shatter; from another conservative perspective, these traditions represent collective wisdom and we should be wary of reformers who try to start anew without recognizing that traditions are traditions for a good reason. But for now I will set all this aside and focus on the question of behavioral economics.

Step aside from economics for a moment, though, and things look a little different. Instead of thinking of “heuristics and biases” or “behavioral economics” in opposition to simplistic models of rationality (I’ve said it before and I’ll say it again; I see no reason why a long-discredited psychology model from the 1930s and 1940s should be taken as any sort of starting point for understanding human decision making; utility theory is, at best, one framework for such modeling), and put this work in a more general context of disparagement of human decision making.

To put it another way, think about “behavioral economics” not so much as “economics” but as “behavioral.” From a psychology point of view, behaviorism is a nearly century-old theory that was in many ways superseded by cognitive psychology. And, in many ways, “behavioral economics” is a sort of counter-revolution: it’s full of tropes under which people are doing things for irrational reasons, in which actions speak louder than words etc.

The full story here is complicated but one reason I think these ideas are popular in neoclassical economics is that they are, in some sense, anti-democratic. If people’s votes are determined based on the time of the menstrual cycle or on the outcomes of college football games, then elections are pretty silly, no? Which is an implicit argument in favor of lower taxes and more power for business, as compared to government (or, for that matter, unions).

This becomes particularly clear when we look at work along these lines in political science. If, for example, subliminal smiley faces have big effects on political attitudes, then this should cause us to think twice about how seriously to take such attitudes, no? Or if men’s views on economic redistribution are in large part determined by physical strength, or if women’s vote preferences are in large part determined by what time of the month it is, or if both sexes’ choice to associate with co-partisans is in large part determined by how they smell, then this calls into question a traditional civics-class view of the will of the people.

Luckily (or, perhaps, depending on your view, unluckily), the evidence for the empirical claims in the above paragraphs ranges from weak to nonexistent.

But my point is that there is a wave of research, coming from different directions, but all basically saying that our political attitudes are shallow and easily manipulated and thus, implicitly, not to be trusted. I don’t find this evidence convincing and, beyond this, I’m troubled by the eagerness some people seem to show to grab on to such claims, with their ultimately anti-democratic implications.

Let’s be clear here, though: I do have a dog in this fight, as the saying goes. In 1993, Gary King and I published an influential paper claiming that wide swings in the polls, swings that had often been taken as evidence of the capriciousness of voters or of their easily-manipulated nature, could be reinterpreted as evidence in favor of voters moving to their “enlightened preferences.” And then, more recently, David Rothschild, Sharad Goel, Doug Rivers and I updated this argument by providing evidence that some poll swings can be mostly explained by differential nonresponse without any large attitude changes. I’ve published work (with Aaron Edlin and Noah Kaplan) arguing why voting can be rational. And I’ve worked with Jeff Lax and Justin Phillips on their series of papers on the responsiveness of state legislators to state-level opinion. In my research I’ve been strongly committed, in many different ways, to the model in which voter preferences and attitudes should be taken seriously. So it would be fair enough to read my resistance to voters-are-influenced-by-irrelevant-stimuli arguments in that context. I’m providing you with my perspective, but I recognize that other perspectives are out there.

What I’m getting at is that I see a common thread in a lot of the counterintuitive, tabloid, Psychological-Science-type work out there, and that thread is a dismissal of human rationality and even human agency in the political (and, to some extent, the economic) arena. Here I’m speaking of “rationality” not in the limited sense of utility maximization but in the more general sense of thoughtful, purposeful decision making.

In the “Psychological Science” world, voters’ attitudes are determined by upper-body strength and the time of the month, their attitudes on important issues are influenced by meaningless subliminal stimuli, and their elections turn on the outcomes of late-October football games, and they flub any decisions involving uncertainty. Throw the words “Florida” and “bingo” at them and they walk slower, without even realizing why, they’re influenced by stereotype threat even without realizing it, and even their choice of clothing is not under their conscious control. Put it all together and you get a pre-cognitive conception of the citizen: not a man or woman who weighs the evidence, forms political views, and makes economic and political decisions, but a creature who is continually pushed to and fro by influences of which he or she is not even aware, an unstable product of hormones and the manipulators of political and social marketers, a sort of particle in the water being jostled by invisible Brownian forces.

Let me repeat that the evidence for many of these claims is weak, indeed I have the feeling that a lot of people want to believe in these things so they grab on to whatever “p less than .05″ comparisons they find, and take them as representative of the general population, as scientific truth. On the other hand, I perhaps am coming from the opposite direction.

What I’m getting at is that there’s a political theme here, and also a scientific theme: I see a lot (although not all!) of this “behavioral” work as being behaviorist in the sense of being faithful to a pre-cognitive, and pre-modern conception of psychology.

The cognitive-psychology perspective, as I see it, is that we are thinking beings, and to the extent that we are influenced in irrational ways (whether by hormones, or subliminal marketing, or whatever), we mediate these influences through our thought processes. One reason I found the work of Cengiz Erisen so interesting (even while I disagreed with Larry Bartels’s more dramatic claims for the importance of that work) is that Erisen was not just treating his subliminal stimulus as a black box but rather was investigating how our conscious reasoning process might mediate the effects of a non-rational stimulus. In that particular case, the stimulus had no consistent effect on attitudes but I like the general approach of the study.

Six quick tips to improve your regression modeling

It’s Appendix A of ARM:

A.1. Fit many models

Think of a series of models, starting with the too-simple and continuing through to the hopelessly messy. Generally it’s a good idea to start simple. Or start complex if you’d like, but prepare to quickly drop things out and move to the simpler model to help understand what’s going on. Working with simple models is not a research goal—in the problems we work on, we usually find complicated models more believable—but rather a technique to help understand the fitting process.

A corollary of this principle is the need to be able to fit models relatively quickly. Realistically, you don’t know what model you want to be fitting, so it’s rarely a good idea to run the computer overnight fitting a single model. At least, wait until you’ve developed some understanding by fitting many models.

A.2. Do a little work to make your computations faster and more reliable

This sounds like computational advice but is really about statistics: if you can fit models faster, you can fit more models and better understand both data and model. But getting the model to run faster often has some startup cost, either in data preparation or in model complexity.

Data subsetting . . .

Fake-data and predictive simulation . . .

A.3. Graphing the relevant and not the irrelevant

Graphing the fitted model

Graphing the data is fine (see Appendix B) but it is also useful to graph the estimated model itself (see lots of examples of regression lines and curves throughout this book). A table of regression coefficients does not give you the same sense as graphs of the model. This point should seem obvious but can be obscured in statistical textbooks that focus so strongly on plots for raw data and for regression diagnostics, forgetting the simple plots that help us understand a model.

Don’t graph the irrelevant

Are you sure you really want to make those quantile-quantile plots, influence dia- grams, and all the other things that spew out of a statistical regression package? What are you going to do with all that? Just forget about it and focus on something more important. A quick rule: any graph you show, be prepared to explain.

A.4. Transformations

Consider transforming every variable in sight:
• Logarithms of all-positive variables (primarily because this leads to multiplicative models on the original scale, which often makes sense)
• Standardizing based on the scale or potential range of the data (so that coefficients can be more directly interpreted and scaled); an alternative is to present coefficients in scaled and unscaled forms
• Transforming before multilevel modeling (thus attempting to make coefficients more comparable, thus allowing more effective second-level regressions, which in turn improve partial pooling).

Plots of raw data and residuals can also be informative when considering transformations (as with the log transformation for arsenic levels in Section 5.6).

In addition to univariate transformations, consider interactions and predictors created by combining inputs (for example, adding several related survey responses to create a “total score”). The goal is to create models that could make sense (and can then be fit and compared to data) and that include all relevant information.

A.5. Consider all coefficients as potentially varying

Don’t get hung up on whether a coefficient “should” vary by group. Just allow it to vary in the model, and then, if the estimated scale of variation is small (as with the varying slopes for the radon model in Section 13.1), maybe you can ignore it if that would be more convenient.

Practical concerns sometimes limit the feasible complexity of a model—for example, we might fit a varying-intercept model first, then allow slopes to vary, then add group-level predictors, and so forth. Generally, however, it is only the difficulties of fitting and, especially, understanding the models that keeps us from adding even more complexity, more varying coefficients, and more interactions.

A.6. Estimate causal inferences in a targeted way, not as a byproduct of a large regression

Don’t assume that a regression coefficient can be interpreted causally. If you are interested in causal inference, consider your treatment variable carefully and use the tools of Chapters 9, 10, and 23 to address the difficulties of comparing comparable units to estimate a treatment effect and its variation across the population. It can be tempting to set up a single large regression to answer several causal questions at once; however, in observational settings (including experiments in which certain conditions of interest are observational), this is not appropriate, as we discuss at the end of Chapter 9.

First day of class update

I got to class on time. The class went ok but I spent too much time talking, which is what happens when I don’t put a lot of effort ahead of time into making sure I don’t spend too much time talking.

My first-day-of-class activity was ok but I think I needed another activity for the students, something more statistical, to better set the tone of the course.

I think I should’ve given them a 10-minute work-in-pairs activity where I’d first give them some real-world problem and then ask them, in pairs, to design a study to address it. The problem could be anything: it could be to assess Ebola risks or answer questions about political ideology and personality, or even to design a plan for assessing the effectiveness of this course that they’re taking. Just something that would get them communicating, but also thinking about statistics in some detail. Not just talking generally about the cool problems they’re working on or are interested in, but some attempt to get into details.

We could do this next class, of course, but I already have things planned. So maybe this will have to wait until the next time I teach the course.

Just in case

Hi, R. Could you please prepare 50 handouts of the attached draft course plan (2-sided printing is fine) to hand out to students? I prefer to do this online but it sounds like there’s some difficulty with that, so we can do handouts on this first day of class.


My Amtrak is rescheduled and it is scheduled to arrive in Boston at 4:35. This should give me plenty of time to get to class on time, but Amtrak is sometimes delayed. So if class begins and I am not there yet, please start without me!

If I’m not there, please do the following:

- Get to the room 10 minutes early. Before class begins, chat with the students as they are coming in. You can talk about any topic, as long as it’s statistical: tell them about your qualifying exam, or discuss how to express uncertainty in weather forecasts, or talk about the Celtics (ha ha). No need to be lecturing here, just get them on track, thinking and talking about statistics. Also during that time, please get the projector set up so that, when I do arrive, I can plug in my laptop and be all ready to go.

- Once class begins (I don’t remember the convention at Harvard; will it start exactly at the scheduled time, or 5 minutes later?), start right away with a statistics story. I have stories of my own prepared, but if I’m not there, you can do one yourself. Prepare something; feel free to use the blackboard. It doesn’t have to be a long story; 5 or 10 minutes will be fine.

- Then write the following on the blackboard: “(a) Say something about yourself or your work in relation to statistics, (b) Why are you in this class?”

- Have the students divide into pairs. In pairs, they meet each other:
(3 min) A talks to B
(2 min) B asks a question to A, and A responds
(3 min) B talks to A
(2 min) A asks a question to B, and B responds
They are supposed to be talking to each other about their work in relation to statistics.

- If not all the students fit in the room, that’s not really a problem; you can have the overflow people in the lounge area, doing the same thing.

Once the students have done the intros in pairs, take a few volunteers (or, if there are no volunteers, pick some students and ask them to pick other students) to stand up and answer questions (a) and (b) above. Use these to lead the class into discussions that loop around to consider the relevance and different varieties of statistical communication.

Really, this can take all the class period. But I assume that at some point I’ll arrive—how delayed could Amtrak be, after all?? I just wanted to give you some contingency plan so that nobody has to worry if it’s 6:25 and I’m still not there.


See you

P.S. Here’s what happened.

About a zillion people pointed me to yesterday’s xkcd cartoon

I have the same problem with Bayes factors, for example this:

Screen Shot 2015-01-27 at 4.42.52 PM

and this:

Screen Shot 2015-01-27 at 4.45.03 PM

(which I copied from Wikipedia, except that, unlike you-know-who, I didn’t change the n’s to d’s and remove the superscripting).

Either way, I don’t buy the numbers, and I certainly don’t buy the words that go with them.

I do admit, though, to using the phrase “statistically significant.” It doesn’t mean so much, but, within statistics, everyone knows what it means, so it’s convenient jargon.

P.S. Kruschke had a similar reaction.

Crowdsourcing data analysis: Do soccer referees give more red cards to dark skin toned players?

Raphael Silberzahn Eric Luis Uhlmann Dan Martin Pasquale Anselmi Frederik Aust Eli Christopher Awtrey Štěpán Bahník Feng Bai Colin Bannard Evelina Bonnier Rickard Carlsson Felix Cheung Garret Christensen Russ Clay Maureen A. Craig Anna Dalla Rosa Lammertjan Dam Mathew H. Evans Ismael Flores Cervantes Nathan Fong Monica Gamez-Djokic Andreas Glenz Shauna Gordon-McKeon Tim Heaton Karin Hederos Eriksson Moritz Heene Alicia Hofelich Mohr Fabia Högden Kent Hui Magnus Johannesson Jonathan Kalodimos Erikson Kaszubowski Deanna Kennedy Ryan Lei Thomas Andrew Lindsay Silvia Liverani Christopher Madan Daniel Molden Eric Molleman Richard D. Morey Laetitia Mulder Bernard A. Nijstad Bryson Pope Nolan Pope Jason M. Prenoveau Floor Rink Egidio Robusto Hadiya Roderique Anna Sandberg Elmar Schlueter Felix S Martin Sherman S. Amy Sommer Kristin Lee Sotak Seth Spain Christoph Spörlein Tom Stafford Luca Stefanutti Susanne Täuber Johannes Ullrich Michelangelo Vianello Eric-Jan Wagenmakers Maciej Witkowiak SangSuk Yoon and Brian A. Nosek write:

Twenty-­nine teams involving 61 analysts used the same data set to address the same research questions: whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players and whether this relation is moderated by measures of explicit and implicit bias in the referees’ country of origin. Analytic approaches varied widely across teams. For the main research question, estimated effect sizes ranged from 0.89 to 2.93 in odds ratio units, with a median of 1.31. Twenty teams (69%) found a significant positive effect and nine teams (31%) observed a non­significant relationship. The causal relationship however remains unclear. No team found a significant moderation between measures of bias of referees’ country of origin and red card sanctionings of dark skin toned players. Crowdsourcing data analysis highlights the contingency of results on choices of analytic strategy, and increases identification of bias and error in data and analysis. Crowdsourcing analytics represents a new way of doing science; a data set is made publicly available and scientists at first analyze separately and then work together to reach a conclusion while making subjectivity and ambiguity transparent.

“It is perhaps merely an accident of history that skeptics and subjectivists alike strain on the gnat of the prior distribution while swallowing the camel that is the likelihood”

I recently bumped into this 2013 paper by Christian Robert and myself, “‘Not Only Defended But Also Applied': The Perceived Absurdity of Bayesian Inference,” which begins:

Younger readers of this journal may not be fully aware of the passionate battles over Bayesian inference among statisticians in the last half of the twentieth century. During this period, the missionary zeal of many Bayesians was matched, in the other direction, by a view among some theoreticians that Bayesian methods are absurd—not merely misguided but obviously wrong in principle. Such anti-Bayesianism could hardly be maintained in the present era, given the many recent practical successes of Bayesian methods. But by examining the historical background of these beliefs, we may gain some insight into the statistical debates of today. . . .

The whole article is just great. I love reading my old stuff!

Also we were lucky to get several thoughtful discussions:

“Bayesian Inference: The Rodney Dangerfield of Statistics?” — Steve Stigler

“Bayesian Ideas Reemerged in the 1950s” — Steve Fienberg

“Bayesian Statistics in the Twenty First Century” — Wes Johnson

“Bayesian Methods: Applied? Yes. Philosophical Defense? In Flux” — Deborah Mayo

And our rejoinder, “The Anti-Bayesian Moment and Its Passing.”

Good stuff.

“The Statistical Crisis in Science”: My talk this Thurs at the Harvard psychology department

Noon Thursday, January 29, 2015, in William James Hall 765 room 1:

The Statistical Crisis in Science

Andrew Gelman, Dept of Statistics and Dept of Political Science, Columbia University

Top journals in psychology routinely publish ridiculous, scientifically implausible claims, justified based on “p < 0.05.” And this in turn calls into question all sorts of more plausible, but not necessarily true, claims, that are supported by this same sort of evidence. To put it another way: we can all laugh at studies of ESP, or ovulation and voting, but what about MRI studies of political attitudes, or embodied cognition, or stereotype threat, or, for that matter, the latest potential cancer cure? If we can’t trust p-values, does experimental science involving human variation just have to start over? And what to we do in fields such as political science and economics, where preregistered replication can be difficult or impossible? Can Bayesian inference supply a solution? Maybe. These are not easy problems, but they’re important problems.

Here are the slides from the last time I gave this talk, and here are some relevant articles:

[2014] Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. {\em Perspectives on Psychological Science} {\bf 9}, 641–651. (Andrew Gelman and John Carlin)

[2014] The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective. {\em Journal of Management}. (Andrew Gelman)

[2013] It’s too hard to publish criticisms and obtain data for replication. {\em Chance} {\bf 26} (3), 49–52. (Andrew Gelman)

[2012] P-values and statistical practice. {\em Epidemiology}. (Andrew Gelman)