Skip to content

Sometimes you’re so subtle they don’t get the joke

T. A. Frail writes:

When Andrew Gelman, a professor of statistics and political science at Columbia University, wrote that Who’s Bigger? ‘is a guaranteed argument-starter,’ he meant it as a compliment.

In all seriousness, I wish people would read what I wrote, not what they think I meant! My quote continued:

This book is a guaranteed argument-starter. I found something to argue with on nearly every page.

The anti-Woodstein

I received the following email:

Dear professor Andrew Gelman,

My name is **, a resident correspondent of **. I am writing to request for an interview via email. We met once at New York Foreign Press Center one week ago.

As you may know, President Obama will travel to China, Burma and Australia from November 10-16. In China from November 10-12, President Obama will attend the APEC Leaders Meeting and APEC CEO Summit. . . .

Could you please make a comment on what kind of impacts APEC will have . . .

Looking forward to your response.Thanks.

Best regards,


I replied: Hi, sorry, this is outside my area of expertise. You should ask my colleague Andrew Nathan in the poli sci dept at Columbia.

And then the correspondent wrote back:

Thank you very much for your reply. I will have a try to contact your colleague. I will be grateful if you can tell me your colleague’s email. thanks again. wish you have a nice weekend.

I didn’t reply to this one. I think a reporter should be able to find someone’s contact information! (Pro tip: try googling *andrew nathan columbia political science*.)

Try answering this question without heading to Wikipedia

Phil writes:

This is kind of fun (at least for me): You would probably guess, correctly, that membership in the US Chess Federation is lower than its peak. Guess the year of peak membership, and the decline (as a percentage) in the number of members from that peak.

My reply: I don’t know, but I’d guess that the fraction of members who are kids is much higher than in the past.

I’m sure that my anti-Polya attitude is completely unfair

Reading this post in which Mark Palko quotes from the classic “How to Solve It” by the legendary mathematician and math educator George Polya, I was reminded of my decades-long aversion to Polya, an attitude that might seem odd given that (a) Polya has an excellent reputation, and (b) I’ve never read more than a paragraph of anything he’s written.

Why this attitude? Because I did the training program for the high school math olympiad one summer, and Polya was, like, the god of the people who ran that program. The olympiad program had all sorts of attitudes I couldn’t stand—the one I remember most was that using calculus or Cartesian coordinates to solve geometry problems was considered a form of cheating (see the last two paragraphs of this post, also this), and so I ended up with a negative attitude toward Polya and everyone else associated with school mathematics competitions. Completely unfair, I’m sure. I bet if I were to read Polya now, I’d think he’s great.

Common sense and statistics

John Cook writes:

Some physicists say that you should always have an order-of-magnitude idea of what a result will be before you calculate it. This implies a belief that such estimates are usually possible, and that they provide a sanity check for calculations. And that’s true in physics, at least in mechanics. In probability, however, it is quite common for even an expert’s intuition to be way off.

I agree with Cook’s general message but I’d say it slightly differently. I’d say that even many experts often have no intuition at all when it comes to probability, which will lead them to miss huge conceptual errors in their calculations.

The example that comes to mind is the largely atrocious literature in political science and economics on the probability of a decisive vote. A paper was published in a political science journal giving the probability of a tied vote in a presidential election as something like 10^-90. Talk about innumeracy! The calculation, of course (I say “of course” because if you are a statistician you will likely know what is coming) was based on the binomial distribution with known p. For example, Obama got something like 52% of the vote, so if you take n=130 million and p=0.52 and figure out the probability of an exact tie, you can work out the formula etc etc.

From empirical grounds that 10^-90 thing is ludicrous. You can easily get an order-of-magnitude estimate by looking at the empirical probability, based on recent elections, that the vote margin will be within 2 million votes (say) and then dividing by 2 million to get the probability of it being a tie or one vote from a tie.

The funny thing—and I think this is a case for various bad numbers that get out there—is that this 10^-90 has no intuition behind it, it’s just the product of a mindlessly applied formula (because everyone “knows” that you use the binomial distribution to calculate the probability of k heads in n coin flips). But it’s bad intuition that allows people to accept that number without screaming. A serious political science journal wouldn’t accept a claim that there were 10^90 people in some obscure country, or that some person was 10^90 feet tall. But intuitions about probabilities are weak, even among the sort of quantitatively-trained researchers who know about the binomial distribution.

P.S. The point of this post is not to bang on the people who made this particular mistake but rather to use this as an example to illustrate the widespread lack of intuition about orders of magnitudes of probability, which is relevant to John Cook’s point regarding statistical thinking and communication.

Another example is business-school prof Reid Hastie’s apparent belief that “the probability that a massive flood will occur sometime in the next year and drown more than 1,000 Americans” is more than 20%. 20% sounds like a low number, low enough that Hastie didn’t consider that such floods have been extremely rare in American history. (Even Katrina drowned only 387 people, according to this source which I found by googling Katrina drownings.) This is not to disparage the importance of preparing for floods; even if the probability is only 1%, it’s still makes sense to do what we can to mitigate the risks. My point here is just that probabilities are hard to think about. It’s Gigerenzer’s point.

To continue with the Gigerenzer idea, one way to get a grip on the probability of a tied election is to ask a question like, what is the probability that an election is determined by less than 100,000 votes in a decisive state. That’s happened at least once. (In 2000, Gore won Florida by only 20-30,000 votes.) The probability of an exact tie is of the order of magnitude of 10^(-5) times the probability of an election being decided by less than 100,000 votes.

See, that wasn’t so hard! Gigerenzering wins again.

Trajectories of Achievement Within Race/Ethnicity: “Catching Up” in Achievement Across Time

Just in time for Christmas, here’s some good news for kids, from Pamela Davis-Kean and Justin Jager:

The achievement gap has long been the focus of educational research, policy, and intervention. The authors took a new approach to examining the achievement gap by examining achievement trajectories within each racial group. To identify these trajectories they used the Early Childhood Longitudinal Study–Kindergarten Cohort, which is a nationally representative sample of students in kindergarten through Grade 5. In the analyses, the authors found heterogeneity within each racial group in mathematics and reading achievement, suggesting that there are in fact achievement gaps within each race/ethnicity group. The authors also found that there are groups that catch up to the highest achieving groups by Grade 5, suggesting a positive impact of schooling on particular subgroups of children. The authors discuss the various trajectories that have been found in each racial group and the implications this has for future research on the achievement gap.

Using statistics to make the world a better place?


In a recent discussion involving our frustration with crap research, Daniel Lakeland wrote:

I [Lakeland] really do worry about a world in which social and institutional and similar effects keep us plugging away at a certain kind of cargo-cult science that produces lots of publishable papers and makes it easier to get funding for projects that don’t really promise to give us fundamental and predictive models that can drive real improvements in people’s lives.

It’s sort of a “it’s 2014, where’s my flying car?” attitude I know, but I’d be satisfied with a lot of things other than flying cars, such as:

1) real, effective solutions to antibiotic resistant organisms
2) cures for cystic fibrosis
3) reducing the effect of heart disease on people under age 75 by 30%
4) understanding major causes of “the obesity epidemic” in a real detailed way and finding effective ways to reverse it.
5) Being able to regenerate organic replacement joint components instead of titanium hip implants etc
6) Growing replacement kidneys
7) A more significantly more effective and long lasting pertussis vaccine

Is the way we are doing science today going to provide any or all of these things in the next 30 years? What are some similar order of magnitude things that it has provided since 1980 using current “modern” methods and funding priorities, publication priorities, tenure systems, and so forth?

I want to consider this question, difficult as it is. It does seem that slow progress is being made in our conceptual understanding, at least in the sense that new paradigms are developing in different sciences. In sociology there’s an increased interest in network effects, in medicine it seems that a lot of things are being now understood in terms of the microbiome, in large-scale biology there’s a focus on the fetal environment, in microbiology we’ve been hearing a lot about the rapid evolution of viruses, etc. It seems to me, as an outsider in these fields, that the concepts I’ve mentioned have been around for awhile, but it’s only recently that they’ve been moved to the center of discussion. And this, in turn, seems to be the result of millions of research studies on many thousands of topics, which collectively ruled out earlier favored explanations. By revealing areas where the old paradigms failed, all this research made room for new, more difficult, paradigms to be taken seriously.

It’s posterior predictive checking, on a larger scale. A chicken is an egg’s way of constructing another egg, and empirical research is a scientific theory’s way of uncovering the theory’s flaws.

It’s harder to make such clear statements about statistics or engineering or computer science, as these are essentially tools in the service of science rather than being the object of study themselves.

And there are fields where new paradigms don’t seem so apparent. Consider three areas with which I’m somewhat familiar: political science, psychology, and economics. In political science, I see persistent difficulties in integrating different perspectives coming from the studies of public opinion, institutions, and political maneuvering. It really feels like we’re not seeing the whole elephant at once. And I include my own research as an example of this incomplete perspective. Psychology seems to be undergoing a reforming process, in which various unsuccessful paradigms such as embodied cognition are being rejected, with no clear unification of the cognitive and behavioral approaches. Similarly in economics, although there it seems worse in that various incomplete perspectives are taken by their proponents as being all-encompassing.

The point of that previous paragraph is not to pass judgment on these different social science fields; it’s just my impression that they are in various stages of reconstruction and reform. From my outsider’s perspective, biology and medicine seem to be in better shape, philosophically speaking, in that there seems to be more of a recognition that traditionally dominant paradigms miss a lot.

How does statistics fit into all this? Statistics can (potentially) do a lot:
– Guidance in data collection and the assessment of measurements. And recall that “data collection” is not just about how to collect a random sample or assign treatments in an experiment; it also includes considerations of what to measure and how to measure it. For example, if you are interested in measuring the most fecund days of a woman’s monthly cycle, it makes sense to check out the medical literature on the topic.
– Methods for calibrating variation by comparing to models of randomness. This is where I think that statistical significance and p-values fit in: not as a way to make scientific discoveries (“p less than .05 so we get published in the tabloids!”) but as a measuring stick when interpreting observed comparisons and variation.
– Tools for combining information. That to me is the most general way to think of “inference,” and it encompasses all sorts of things, from classical “iid” models to more complicated approaches including all sorts of time series and spatial analyses, multilevel models for partial pooling, regularization methods that allow users to ramp up the amount of data they can include in their models, and Bayesian inference for balancing uncertainties.
– Methods for checking fit, for revealing the aspects of data that are not well explained by our models. To me this includes all of exploratory data analysis, which is about learning the unexpected. Or, to step back slightly, exploratory data analysis is about putting us in a situation in which we are able to learn the unexpected. And, remember, “unexpected” = “not expected” = “something not predicted by our model,” where this “model” may be implicit.

Again, statistics is in the service of science, and I see statistics as a way of organizing science rather than as a way of making scientific discovery.

Sometimes we can perform a statistical analysis that seems to take us all the way from data to model to scientific conclusion—for example, this cool age-period-cohort model with Yair—but, even there, I think most of the real scientific heavy lifting is coming from existing substantive theories; the statistics is more of a way of rearranging the data or, as Dan Kahan would put it, of adjudicating between competing hypotheses or underlying models of reality.

Research benefits of feminism

Screen Shot 2014-11-02 at 4.18.31 AM

Unlike that famous bank teller, I’m not “active in the feminist movement,” but I’ve always considered myself a feminist, ever since I heard the term (I don’t know when that was, maybe when I was 10 or so?). It’s no big deal, it probably just comes from having 2 big sisters and growing up during the 1970s.

And most of the time this attitude is pretty much irrelevant to my professional life. It comes up every now and then when interpreting research claims (see here, for example) in which the male perspective is taken as the baseline. And when I teach I try to avoid overuse of stereotypically male-interest topics such as sports.

And my feminism has made me somewhat immune to simplistic gender-essentialist ideas such as expressed in various papers that make use of schoolyard evolutionary biology [see definition below] that we’ve discuss over the years on this blog.

But it doesn’t affect my approach for partial pooling in hierarchical models, or my approach to inference from non-random samples, or the ways in which I monitor convergence for Hamiltonian Monte Carlo, or my models for voting, etc etc etc. Most of my research, even in political science, is basically “orthogonal” to feminism. Even studies that could have some sort of feminist interpretation—for example, my analysis with Yair of differences in attitudes toward abortion, or our estimate of geographic variation in the gender gap—doesn’t have any feminist content at all, at least not that I notice.

Recently, though, I had a research project where a feminist perspective made (a bit of) a difference. It was from my paper with Christian Hennig on going beyond objectivity and subjectivity in statistical thinking.

It came up near the beginning of the paper. We start off by discussing the usual dichotomy in statistics between objective and subjective approaches:

Statistical discourse on objectivity and subjectivity is at an impasse. Ideally these concepts would be part of a consideration of the role of different sorts of information and assumptions in statistical analysis, but instead they often seemed to be used in restrictive and misleading ways.

One problem is that the terms “objective” and “subjective” are loaded with so many associations and are often used in a mixed descriptive/normative way. Scientists whose methods are branded as subjective have the awkward choice of either saying, No, we are really objective, or else embracing the subjective label and turning it into a principle. From the other direction, scientists who use methods labeled as objective often seem so intent on eliminating subjectivity from their analyses, that they end up censoring themselves. This happens, for example, when researchers rely on p-values but refuse to recognize that their analyses are contingent on data (as discussed by Simmons, Nelson, and Simonsohn, 2011, and Gelman and Loken, 2014). More generally, misguided concerns about subjectivity can lead researchers to avoid incorporating relevant and available information into their analyses.

And then we say this:

A perhaps helpful analogy is to gender roles in social interactions. To get respect, women often need to choose between claiming stereotypically-male behaviors or affirming, or “taking back,” feminine roles. At the same time, men can find it difficult to step outside the restrictions implied by traditional masculinity. Rather than point and label, it can be better in such situations to identify the positive aspects of each sex role and then go from there. Similarly, good science contains both subjective and objective elements, and we think it would be best to understand how these perspectives can complement each other.

I suspect that, to many readers, that paragraph won’t fit in at all. But to me it makes a lot of sense. Conventional labels, whether of objectivity and subjectivity, or of masculine and feminine, can be a trap. The labels are not empty, they reflect real differences (being a feminist does not is all about understanding, not denying, the real differences that exist on average between the sexes—along with recognizing that averages are just that, and don’t represent all cases), but people can also get stuck in these boxes, or get stuck trying to rearrange these boxes. So, to me, a feminist attitude gave me a useful perspective on how to think about the important topic of objectivity and subjectivity in science and statistics. (And it’s a topic with real applications; see for example this paper which discusses how we use model checking to incorporate both subjective and objective elements into a Bayesian analysis in tosicology.)

Just to be clear: I’m not claiming that feminism is purely a good thing for a researcher, or even that it’s purely good for my research. There may well be important work that I’m missing, or misunderstanding, because of my political biases. I think everyone must have such blind spots, but that doesn’t excuse me from the blind spots that I have.

At some level, in this post I’m making the unremarkable point that each of us has a political perspective which informs our research in positive and negative ways. The reason that this particular example of the feminist statistician is interesting is that it’s my impression that feminism, like religion, is generally viewed as a generally anti-scientific stance. I think some of this attitude comes from some feminists themselves who are skeptical of science in that is a generally male-dominated institution that is in part used to continue male dominance of society, and it also comes from people such as Larry Summers who might say that reality has an anti-feminist bias.

Feminism, like religion, can be competitive with science or it can be collaborative. See, for example, the blog of Echidne for a collaborative approach. To the extent that feminism represents a set of tenets are opposed to reality, it could get in the way of scientific thinking, in the same way that religion would get in the way of scientific thinking if, for example, you tried to apply faith healing principles to do medical research. If you’re serious about science, though, I think of feminism (or, I imagine, Christianity, for example) as a framework rather than a theory—that is, as a way of interpreting the world, not as a set of positive statements. This is in the same way that I earlier wrote that racism is a framework, not a theory. Not all frameworks are equal; my point here is just that, if we’re used to thinking of feminism, or religion, as anti-scientific, it can be useful to consider ways in which these perspectives can help one’s scientific work.

P.S. It would also be fair to say that I talk the talk but don’t walk the walk: a glance at my list of published papers or the stan-dev list reveals that most of my collaborators are male. I don’t know what to say about this—it could be interpreted as evidence that I’m not a real feminist because I’m not committed enough to equality between the sexes in my own professional life, or as evidence of the emptiness of feminism: like a Christian Scientist who talks tough but then goes to the doctor when he gets sick, I’m a feminist who, when given the choice of how to spend my hard-earned research dollars, generally hires men. I don’t think I’m under any obligation to explain myself at all on this one, but to the extent I do, I guess I’d say that there are more men than women working in computational statistics right now, that I hire the people who seem best for the job, and these people often happen to be male—a set of observations, or opinions, that can be interpreted in any number of ways.

P.P.S. As promised, here’s my definition of “schoolyard evolutionary biology”: It’s the idea that, because of evolution, all people are equivalent to all other people, except that all boys are different from all girls. It’s the attitude I remember from the grade school playground, in which any attribute of a person, whether it be how you walked or how you laughed or even how you held your arms when you were asked to look at your fingernails (really) were gender-typed. It’s gender and race essentialism. And when you combine it with what Kahneman and Tversky called “the law of small numbers” (the attitude that any underlying pattern should reproduce in any small sample) has led to endless chasing of noise in data analyses. In short, if you believe this sort of essentialism, you can find it just about anywhere you look.

P.P.P.S. And, just to clarify further, of course there are lots of systematic differences between boys and girls, and between men and women, that are not directly sex-linked. To be a feminist is not to deny these differences; rather, placing these differences within a larger context is part of what feminism is about.

On deck this week

Mon: Research benefits of feminism

Tues: Using statistics to make the world a better place?

Wed: Trajectories of Achievement Within Race/Ethnicity: “Catching Up” in Achievement Across Time

Thurs: Common sense and statistics

Fri: I’m sure that my anti-Polya attitude is completely unfair

Sat: The anti-Woodstein

Sun: Sometimes you’re so subtle they don’t get the joke

It’s Too Hard to Publish Criticisms and Obtain Data for Replication

Peter Swan writes:

The problem you allude to in the above reference and in your other papers on ethics is a broad and serious one. I and my students have attempted to replicate a number of top articles in the major finance journals. Either they cannot be replicated due to missing data or what might appear to be relatively minor improvements in methodology may either remove or sometimes reverse the findings. Almost invariably, the journal is reluctant publish a comment. Due to the introduction of a new journal, Critical Finance Review, by Ivo Welsh,, that insists on the provision of data/code and encourages the original authors to further comment, this poor outlook is improving in the finance discipline.

See for example: Gavin S. Smith and Peter L. Swan, Do concentrated institutional investors really reduce executive compensation whilst raising incentives?. Code, CFR 3-1, 49-83.

and the response:

Jay C. Hartzell and Laura T. Starks, Institutional Investors and Executive Compensation Redux: A Comment on “Do Concentrated Institutional Investors Really Reduce Executive Compensation Whilst Raising Incentives”, CFR 3-1, 85-97.

The model of criticism and rebuttal is fine, but it’s disturbing that the people criticized never seem to back down and say they were wrong. I don’t think people should always admit they’re wrong, because sometimes they’re not. But everybody makes mistakes, while the rate of admission of mistakes seems suspiciously low!