Skip to content

PPNAS: How does it happen? And happen? And happen? And happen?

Screen Shot 2013-08-03 at 4.23.29 PM

In the comment thread to today’s post on journalists who take PPNAS papers at face value, Mark asked, in response to various flaws pointed out in one of these papers:

How can the authors (and the reviewers and the editor) not be aware of something so elementary?

My reply:

Regarding the authors, see here. Statistics is hard. Multiple regression is hard. Figuring out the appropriate denominator is hard. These errors aren’t so elementary.

Regarding the reviewers, see here. The problem with peer review is that the reviewers are peers of the authors and can easily be subject to the same biases and the same blind spots.

Regarding the editor: it doesn’t help that she has the exalted title of Member of the National Academy of Sciences. With a title like that, it’s natural that she thinks she knows what she’s doing. What could she possibly learn from a bunch of blog commenters, a bunch of people who are so ignorant that they don’t even believe in himmicanes, power pose, and ESP?

P.S. Let me clarify. I don’t expect or demand that PPNAS publish only wonderful papers, or only good papers, or only correct papers. Mistakes will happen. The publication process is not perfect. But I would like for them to feel bad about publishing bad papers. What really bothers me is that, when a bad paper is published, they just move on, there seems to be no accountability, no regret. They act as if the publication process retrospectively assigns validity to the work (except in very extreme circumstances of fraud, etc.) I’m bothered by the PPNAS publish-and-never-look-back attitude for the same reason I’m bothered that New York Times columnist David Brooks refuses to admit his errors.

Making mistakes is fine, it’s the cost of doing business. Not admitting your mistakes, that’s a problem.

Journalists are suckers for anything that looks like science. And selection bias makes it even worse. But I was unfair to NPR.


Journalists are suckers. Marks. Vics. Boobs. Rubes.

You get the picture.

Where are the classically street-trained reporters, the descendants of Ring Lardner and Joe Liebling, the hard-bitten journos who would laugh in the face of a press release?

Today, nowhere in evidence.

I’m speaking, of course, about the reaction in the press to the latest bit of “p less than .05” clickbait to appear in PPNAS. Here’s what I wrote yesterday regarding the article, “Physical and situational inequality on airplanes predicts air rage”:

NPR will love this paper. It directly targets their demographic of people who are rich enough to fly a lot but not rich enough to fly first class, and who think that inequality is the cause of the world’s ills.

This morning I was curious so I googled the name of the article’s first author and NPR. No hits on this study. But a lot from other news organizations:

Screen Shot 2016-05-03 at 11.39.30 AM

Let’s go through and take a look:

Deborah Netburn in the L.A. Times presents the story completely uncritically. Zero concerns. From the authors’ lips to God’s ears.

Carina Storrs at CNN: 12 paragraphs of unskeptical parroting of the authors’ claims, followed by three paragraphs of very mild criticism (quoting psychologist Michael McCullough as saying that the study “is provocative, but it does not strike me as an open and shut case”), followed by two more paragraphs by the study’s author.

Gillian Mohney at ABC News: no skepticism at all, she buys into the whole study, hook, line, and sinker.

Bob Weber, CTV News: Again takes it at face value. A regression with p less than .05 in PPNAS is good enough for Bob Weber.

Unsigned, ABC Radio: A short five-paragraph story, the last paragraph of which is, “Although this study points to a link between air rage and first class cabins, it does not prove causation.”

Vanessa Lu, Toronto Star: Straight P.R., no chaser.

Peter Dockrill, Science Alert: A nearly entirely credulous story, marred only by a single paragraph buried in the middle of the story, quoting Michael McCullough from that CNN article.

And Sophie Ryan at the New Zealand Herald buys into the whole story. Again, if it’s published in PPNAS and it tells us something we want to hear, run with it.

LA Times, ABC News, Toronto Star, sure, fine, what can you expect? But the New Zealand Herald? I’m disappointed. You can do better. If NPR dodged this bullet, you can too.

Where were the savvy reporters?

Where were Felix Salmon, Ed Yong, Sharon Begley, Julie Rehmeyer, Susan Perry, etc., in all this? The quantitative and science reporters who know what they’re doing? They didn’t waste their time with this paper. They see the equivalent in Psychological Science each week, and they just tune it out.

You don’t see the most respected pop music critics reviewing the latest Nickelback show, right? OK, maybe at the Toronto Star. But nowhere else.

Where were Nate Silver’s 538 and the New York Times’s Upshot team? They didn’t waste their time with this. They like to analyze their own data. They know that data analysis is hard, and they don’t trust any numbers they haven’t crunched themselves.

We have a classic case of selection bias. The knowledgeable reporters don’t waste their time on this, leaving the suckers to write it up.

Comparison to himmicanes

Here’s a data point, though. This air rage study, like the power pose study, got nearly uniformly positive coverage, whereas the ovulation-and-clothing study and the himmicanes study were accompanied in their news reports with a bit of skepticism (not as much as was deserved, but some). Why?

I suspect a key factor is that the conclusions of this new paper told people what they want to hear: flying sucks, first-class passengers are assholes, social inequality is a bad thing, and it’s been proved by science!

Also, the ovulation-and-clothing and himmicanes studies had particularly obvious errors in their conceptualization and measurement, whereas the statistical flaws in the air rage study are more subtle and have to do with scaling of ratios and the interpretation of multiple regression coefficients.

A template for future news stories

OK, fine, you might say. But what’s a reporter to do? They can’t always call Andrew Gelman at Columbia University for a quote, and they typically won’t have the technical background to evaluate these papers by themselves.

But I do have a suggestion, a template for how reporters can handle PPNAS studies in the future, a template that respects the possibility that these papers can have value.

I’ll share that template in my next post.

P.S. BoingBoing fell for it too. Too bad. You can do better, BoingBoing!

P.P.S. Felix Salmon pointed out that the study was also promoted completely uncritically in Science magazine. Tabloids gonna tabloid.

Ahhhh, PPNAS!


To busy readers: Skip to the tl;dr summary at the end of this post.

A psychology researcher sent me an email with subject line, “There’s a hell of a paper coming out in PPNAS today.” He sent me a copy of the paper, “Physical and situational inequality on airplanes predicts air rage,” by Katherine DeCelles and Michael Norton, edited by Susan Fiske, and it did not disappoint. By which I mean it exhibited the mix of forking paths and open-ended storytelling characteristic of these sorts of PPNAS or Psychological Science papers on himmicanes, power pose, ovulation and clothing, and all the rest.

There’s so much to love (by which I mean, hate) here, I hardly know where to start.

– Coefficient estimate and standard errors such as “1.0031** (0.0014)” (yes, that’s statistically significantly different from the baseline value of 1.0000).

– Another coefficient of “11.8594” (dig that precision) with a standard error of “11.8367” which is still declared statistically significant at the 5% level. Whoops!

– The ridiculous hyper-precision of “Flights with first class present are ∼46.1% of the population of flights” (good thing they assured us that it wasn’t exactly 46.1%).

– The interpretation of zillions of regression coefficients, each one controlling for all the others. For example, “As predicted, front boarding of planes predicted 2.18-times greater odds of an economy cabin incident than middle boarding (P = 0.005; model 2), an effect equivalent to an additional 5-h and 58-min flight delay (0.7772 front boarding/0.1305 delay hours).” What does it all mean? Who cares!

– No raw data. Sorry, proprietary restrictions so nobody can reproduce this analysis! (Don’t get me wrong, I have no problem with researchers learning from proprietary information, I do it all the time. What the National Academy of Sciences is doing publishing this sort of thing, I have no idea. Or, yes, I do have an idea, but I don’t like it.)

Story time: “We argue that exposure to both physical and situational inequality can result in antisocial behavior. . . . even temporary exposure to physical inequality—being literally placed in one’s “class” (economy class) for the duration of a flight—relates to antisocial behavior . . .”

– A charming reference in the abstract to testing of predictions, even though no predictions were supplied before the data were analyzed.

– Dovetailing!

The data

The authors don’t share any of their data, but they do say that there were between 1500 and 4500 incidents in their database, out of between 1 and 5 million flights. So that’s about 1 incident per thousand flights.

They report a rate of incidents of 1.58 per thousand flights in economy seats on flights with first class, .14 per thousand flights in economy seats with no first class, and .31 per thousand flights in first class.

It seems like these numbers are per flight, not per passenger, but that can’t be right: lots more people are in economy class than in first class, and flights with first class seats tend to be in bigger planes than flights with no first class seats. This isn’t as bad as the himmicanes analysis but it displays a similar incoherence.

There’s no reason we should take this sort of tea-leaf-reading exercise seriously. Or, to put it another way—and I’m talking to you, journalists—just pretend this was published in some obscure outlet such as the Journal of Airline Safety. Subtract the hype, subtract the claims of general relevance, just treat it as data (which we don’t get to see).

I should perhaps clarify that I can only assume these researchers were trying their best. They were playing by the rules. Not their fault that the rules were wrong. Statistics is hard, like basketball or knitting. As I wrote a few months ago, I think we have to accept statistical incompetence not as an aberration but as the norm. Doing poor statistical analysis doesn’t make Katherine DeCelles and Michael Norton bad people, any more than I’m a bad person just cos I can’t sink a layup.

tl;dr summary

NPR will love this paper. It directly targets their demographic of people who are rich enough to fly a lot but not rich enough to fly first class, and who think that inequality is the cause of the world’s ills.

P.S. I was unfair to NPR. See here.


Some of the discussion of yesterday’s post reminded me of a wonderful bit from Life on the Mississippi:

When I was a boy, there was but one permanent ambition among my comrades in our village on the west bank of the Mississippi River. That was, to be a steamboatman. We had transient ambitions of other sorts, but they were only transient. When a circus came and went, it left us all burning to become clowns; the first negro minstrel show that came to our section left us all suffering to try that kind of life; now and then we had a hope that if we lived and were good, God would permit us to be pirates. These ambitions faded out, each in its turn; but the ambition to be a steamboatman always remained. . . .

By and by one of our boys went away. He was not heard of for a long time. At last he turned up as apprentice engineer or ‘striker’ on a steamboat. This thing shook the bottom out of all my Sunday-school teachings. That boy had been notoriously worldly, and I just the reverse; yet he was exalted to this eminence, and I left in obscurity and misery. There was nothing generous about this fellow in his greatness. He would always manage to have a rusty bolt to scrub while his boat tarried at our town, and he would sit on the inside guard and scrub it, where we could all see him and envy him and loathe him. And whenever his boat was laid up he would come home and swell around the town in his blackest and greasiest clothes, so that nobody could help remembering that he was a steamboatman; and he used all sorts of steamboat technicalities in his talk, as if he were so used to them that he forgot common people could not understand them. He would speak of the ‘labboard’ side of a horse in an easy, natural way that would make one wish he was dead. And he was always talking about ‘St. Looy’ like an old citizen; he would refer casually to occasions when he ‘was coming down Fourth Street,’ or when he was ‘passing by the Planter’s House,’ or when there was a fire and he took a turn on the brakes of ‘the old Big Missouri;’ and then he would go on and lie about how many towns the size of ours were burned down there that day. Two or three of the boys had long been persons of consideration among us because they had been to St. Louis once and had a vague general knowledge of its wonders, but the day of their glory was over now. They lapsed into a humble silence, and learned to disappear when the ruthless ‘cub’-engineer approached. This fellow had money, too, and hair oil [emphasis added].

Twain continues:

Also an ignorant silver watch and a showy brass watch chain. He wore a leather belt and used no suspenders. If ever a youth was cordially admired and hated by his comrades, this one was. No girl could withstand his charms. He ‘cut out’ every boy in the village. When his boat blew up at last, it diffused a tranquil contentment among us such as we had not known for months. But when he came home the next week, alive, renowned, and appeared in church all battered up and bandaged, a shining hero, stared at and wondered over by everybody, it seemed to us that the partiality of Providence for an undeserving reptile had reached a point where it was open to criticism.

The reason why I happen to remember this is that my copy of Life on the Mississippi I bought used, and whoever’d read the book before had underlined “hair oil.” Then, thirty years later, I happen to get myself in a conversation about hair oil, and I have the perfect literary passage to point you to.

Mark Twain didn’t quite make it to the final round of our speaker competition (as you may recall, he lost out to Miguel de Cervantes), but when it comes to quotes about hair oil, he’s got everyone beat.

P.S. As for the second-most-quotable writer on hair oil, here’s a pretty good (if sloppy) page of Peter De Vries quotes. I only remember a few of them. It’s a long time since I read Peter De Vries, 20 years, maybe? Also no sources are given so I don’t know if they’re all real. Anyway, here are a few that give a sense of some of his moods:

Who of us is mature enough for offspring before the offspring themselves arrive? The value of marriage is not that adults produce children but that children produce adults.

I am not impressed by the Ivy League establishments. Of course they graduate the best – it’s all they’ll take, leaving to others the problem of educating the country. They will give you an education the way the banks will give you money – provided you can prove to their satisfaction that you don’t need it.

Do you believe in astrology? -I don’t even believe in astronomy.

Are you pro or anti-biotics?


Paul Alper points to this news article by Susan Perry:

Probiotics have been overhyped and rely on ‘shaky’ science, reporter finds

Although some of these studies’ results may be promising, they aren’t strong enough to support the long list of claims currently being made by the manufacturers of probiotic products. . . .

Perry links to a news article by Megan Scudellari, who writes:

Based on the smaller-scale studies done so far, there’s no indication that probiotics can treat obesity, autism, diabetes, or high cholesterol. Nor do they seem effective against the flu or common cold. . . .

I’m reminded of the Peter De Vries book in which a character was asked if he was pro or anti-macassar.

P.S. More on macassar here.

On deck this week

Mon: Are you pro or anti-biotics?

Tues: “Null hypothesis” = “A specific random number generator”

Wed: No guarantee

Thurs: The Puzzle of Paul Meehl: An intellectual history of research criticism in psychology

Fri: Redemption

Sat: Doing data science

Sun: Will transparency damage science?

No Retractions, Only Corrections: A manifesto.

Under the heading, “Why that Evolution paper should never have been retracted: A reviewer speaks out,” biologist Ben Ashby writes:

The problems of post-publication peer review have already been highlighted elsewhere, and it certainly isn’t rare for a paper to be retracted due to an honest mistake (although most retractions are due to misconduct). Moreover, one could argue that the mistakes in Kokko and Wong’s 2007 paper were sufficient to warrant a retraction as they significantly affected the conclusions. But by that logic, a large number of empirical studies should also be retracted due to incorrect statistical analyses or overreliance on fickle p-values, leading to irreproducible results.

OK, I have no problems so far, except to note that this is never gonna happen.

The part I don’t like is what comes next:

My concern is that the forced retraction of the original paper sends a bad message to the scientific community. Kokko [co-author of the original paper that got retracted, and also of the later paper that pointed out the error] has effectively been penalized for critiquing her own work, when in fact she should be applauded for her honesty.

No no no no no! Retraction is not a “penalty,” it is just a matter of correcting the scientific record.

Maybe there should be no such thing as retraction, or maybe we could ban the word “retraction” and simply offer “corrections.” That would be fine with me. The point is never to “expunge the record,” it’s about correcting the record so that later scholars don’t take a mistaken claim as being true, or proven.

But, to the extent there are retractions, or corrections, or whatever you want to call them: Sure, just do it. It’s not a penalty or a punishment. I published corrections for two of my papers because I found that they were in error. That’s what you do when you find a mistake.

I agree with Ashby when he writes:

These problems need to be highlighted at source (i.e. as corrections/erratum next to the original paper), with readers directed to the new paper for a fuller exploration of the corrected model.

I just don’t see how this differs from any other correction or retraction in the scientific literature. When there is an error, you want it to stay in the record, along with its correction.


Ashby writes:

In the case of Kokko and Wong (2007), does a retraction achieve anything that an Erratum or Technical Comment could not? In my opinion, no.

I agree and will go one step further. Does a retraction (in the sense of expunging the original material from the published record) ever achieve anything that an Erratum or Technical Comment could not? In my opinion, no.

P.S. Just to elaborate slightly: In general, I don’t imagine this correction procedure will be done by journals. There are just too many papers with serious errors. Even if only 10% of published papers have serious errors, that represents millions of possible corrections to be written, evaluated, and published. So I expect most of this will be done using external post-publication review.

But, when publications do want to correct the record, I think it makes sense to do so with corrections rather than retractions.

Controlling for variation in the weather in a regression analysis: Joe and Uri should learn about multilevel models and then they could give even better advice

Joe Simmons and Uri Simonsohn have an interesting post here. Unfortunately their blog doesn’t have a comment section so I’m commenting here.

They write this at the end of their post:

Another is to use daily dummies. This option can easily be worse. It can lower statistical power by throwing away data. First, one can only apply daily fixed effects to data with at least two observations per calendar date. Second, this approach ignores historical weather data that precedes the dependent variable. For example, if using sales data from 2013-2015 in the analyses, the daily fixed effects force us to ignore weather data from any prior year. Lastly, it ‘costs’ 365 degrees-of-freedom (don’t forget leap year), instead of 1.

With multilevel models you can control for variation at different levels without worrying about running out of degrees of freedom. That’s one of the big selling points of multilevel modeling. Give it a shot.

Counting degrees of freedom is so 20th century.

P.S. Regular readers know I have a lot of respect for Simmons and Simonsohn. But, just for those of you who were not aware: these guys have done some great work. That’s on reason for this post: These are two thoughtful, careful researchers who are well-versed in quantitative psychology. But they don’t know about multilevel models. They should.

Some folks like to get away, take a holiday from the neighborhood

Saw a couple of plays, both excellent.

Fun Home. Compared to what I remembered of the book (which I also thought was excellent), the play seemed to be more about her family and less about Bechdel herself. But that worked for me. Bechdel’s story won’t be shared by everybody, but we all have families. The play really engaged me emotionally. I’ll say one thing, though. The music was fine but it didn’t live up to the hype. It was ok, it did the job in the context of the play, but it wasn’t memorable. It seemed more like cut-rate Sondheim. Actually, what hearing this music really did, is it made me want to see a Sondheim play from beginning to end. Next time there’s a well-reviewed revival of one, I’ll go see it.

Domesticated. This one was just hilarious. Now I want to see everything else Bruce Norris has written. I guess I could just read the scripts but maybe that’s not the same.

Gary Venter’s age-period-cohort decomposition of US male mortality trends

Following up on yesterday’s post on mortality trends, I wanted to share with you a research note by actuary Gary Venter, “A Quick Look at Cohort Effects in US Male Mortality.” Venter produces this graph:

Screen Shot 2016-01-19 at 12.34.29 AM

And he writes:

Cohort effects in mortality tend to be difficult to explain. Often strings of coincidences are invoked – what age they were when smoking reduced, when heart disease treatments improved, etc. One possibility suggested by this pattern is a relationship to the size of the cohort. Being in a bigger age group, especially when society had to adjust to cope, could itself stress that population, producing higher mortality rates. Also the rate of change in cohort size could have related effects.

Looking at the birth rates supports this, but with a lag – see Figure 2. It may be that having a big population three to five years older could produce more stress – less employment and advancement opportunities, for example, while a smaller slightly older population could produce more opportunities. Here I should note that in the mortality data, year of birth is defined by subtracting age at death from year of death. Due to truncation etc. there is some imprecision in this calculation. For comparison, birth rates were thus averaged over the two years ending with each year shown.

There’s lots more in Venter’s report. I didn’t study it in detail, and I’m sure his analysis can be improved in various ways, but he’s trying his best to do a full age-period-cohort decomposition, so that’s cool. Also I assume it would make sense to look at women too and to break things up by ethnicity and region of the country (although that latter variable is tricky, as people can move).

Lots of buzz regarding this postdoc position in London


Tom Churcher writes:

We are currently advertising for an infectious disease modeller to investigate the impact of insecticide resistance on malaria control in Africa. The position is for 3 years in the first instance and is funded by the Wellcome Trust. No previous malaria or mosi experience required. Please circulate to anyone who might be interested.

Infectious-disease modeller working on a malaria project at Imperial College London
Salary: £33,860 – £42,769
Closing date: 22nd May 2016

The post holder will work as part of a large Wellcome Trust funded collaborative team investigating the public health impact of insecticide resistance in Africa. The team includes entomologists, epidemiologists, engineers, computer scientists, health economists social scientists and mathematical modellers from a range of UK and African Institutions. They will analyse data collected from the laboratory and field sites and develop and extend malaria models to incorporate current information on mosquito behaviour and susceptibility to insecticides. The post holder will have the opportunity to work closely with policy makers to ensure that the information learnt in this project is disseminated as widely as possible.

SM086-16KO closes 22nd May ’16

If you would like to discuss this position please contact Dr Tom Churcher (

And they’re using Stan!

Integrating graphs into your workflow


Discussion of statistical graphics typically focuses on individual graphs (for example here). But the real gain in your research comes from integrating graphs into your workflow. You want to be able to make the graphs you want, when you want them.

At the same time, the graph have to be good enough that you can learn from them. So no Excel-style bar graphs. It doesn’t matter how good you are at making excel-style bar graphs, you’re not using graphs effectively.

Not so good

For example consider this graph accompanying a news article by German Lopez:

Screen Shot 2016-01-24 at 4.17.21 PM

This graph isn’t horrible—with care, you can pull the numbers off it—but it’s not set up to allow much discovery, either. This kind of graph is a little bit like a car without an engine: you can push it along and it will go where you want, but it won’t take you anywhere on its own.

So good

What I like to see is when a researcher uses graphics as a way to understand data in a way that they couldn’t learn by just looking at a few numbers. A good positive example here is Dan Kahan, for example here:


This is not such a wonderful graph—it’s not a graph I would make, and I think all those little bars are misleading in that they make it look like it’s data that are being plotted, not merely a fitted model.

But forget about the details. My point is that if you look at this and other posts by Kahan, you see he’s using graphs as a discovery tool—not as an end in themselves, but as a way to understand what he’s doing. And that’s what it’s all about.

“Kasparov To Face Caruana, Nakamura, So In Ultimate Blitz Challenge”

E. J. pointed me to this announcement:

For the first time since his retirement in 2005 Garry Kasparov will play chess against some of the best players on the planet.

The 13th world champion agreed to meet the top three finishers of the 2016 U.S. Championship in a blitz tournament. That turned out to be the top three seeds, Fabiano Caruana, Hikaru Nakamura and Wesley So. . . . on Thursday 28 April and Friday 29 April.

Hey, that’s tomorrow!

On both days you can watch the games live on the US Chess Champs website starting from . . . 1:30 pm [eastern time].

At this site it says the games will start at 1pm (and the tournament is in St. Louis, so that’s 2pm eastern).

I want to watch this! Funny it didn’t get more publicity, given that blitz is the most watchable form of chess and Kasparov is the most famous chess player. I guess there aren’t that many chess fans out there, but still.

P.S. They’re playing 2 games at a time. That doesn’t make sense! The games are only 15 minutes long; why not alternate them so we’re not in the position of trying to watch 2 games at once? Also too bad they’re not playing in the evening when it’s more convenient to watch.

64 Shades of Gray: The subtle effect of chessboard images on foreign policy polarization

Screen Shot 2016-01-18 at 9.19.40 PM

Brian Nosek pointed me to this 2013 paper by Theodora Zarkadi and Simone Schnall, “‘Black and White’ thinking: Visual contrast polarizes moral judgment,” which begins:

Recent research has emphasized the role of intuitive processes in morality by documenting the link between affect and moral judgment. The present research tested whether incidental visual cues without any affective connotation can similarly shape moral judgment by priming a certain mindset. In two experiments we showed that exposure to an incidental black and white visual contrast leads people to think in a “black and white” manner, as indicated by more extreme moral judgments. Participants who were primed with a black and white checkered background while considering a moral dilemma (Experiment 1) or a series of social issues (Experiment 2) gave ratings that were significantly further from the response scale’s mid-point, relative to participants in control conditions without such priming.

I don’t know whether to trust this claim, in light of the equally well-documented finding, “Blue and Seeing Blue: Sadness May Impair Color Perception.” Couldn’t the Zarkadi and Schnall result be explained by an interaction between sadness and moral attitudes? It could go like this: Sadder people have difficulty with color perception so they are less sensitive to the different backgrounds in the images in question. Or maybe it goes the other way: sadder people have difficulty with color perception so they are more sensitive to black-and-white patterns.

I’m also worried about possible interactions with day of the month for female participants, given the equally well-documented findings correlating cycle time with political attitudes and—uh oh!—color preferences. Again, these factors could easily interact with perceptions of colors and also moral judgment.

What a fun game! Anyone can play.

Hey—here’s another one. I have difficulty interpreting this published finding in light of the equally well-documented finding that college students have ESP. Given Zarkadi and Schnall’s expectations as stated it in their paper, isn’t it possible that the participants in their study simply read their minds? That would seem to be the most parsimonious explanation of the observed effect.

Another possibility is the equally well-documented himmicanes and hurricanes effect—I could well imagine something similar with black-and-white or color patterns.

But I’ve saved the best explanation for last.

We can most easily understand the effect discovered by Zarkadi and Schnall’s in the context of the well-known smiley-face effect. If a cartoon smiley face flashed for a fraction of a second can create huge changes in attitudes, it stands to reason that a chessboard pattern can have large effects too. The game of chess, after all, was invented in Persia, and so it makes sense that being primed by a chessboard will make participants think of Iran, which in turn will polarize their thinking, with liberals and conservatives scurrying to their opposite corners. In contrast, a blank pattern or a colored grid will not trigger these chess associations.

Aha, you might say: chess may well have originated in Persia but now it’s associated with Russia. But that just bolsters my point! An association with Russia will again remind younger voters of scary Putin and bring up Cold War memories for the oldsters in the house: either way, polarization here we come.

In a world in which merely being primed with elderly-related words such as “Florida” and “bingo” causes college students to walk more slowly (remember, Daniel Kahneman told us “You have no choice but to accept that the major conclusions of these studies are true”), it is no surprise that being primed with a chessboard can polarize us.

I can already anticipate the response to the preregistered replication that fails: There is an interaction with the weather. Or with relationship status. Or with parents’ socioeconomic status. Or, there was a crucial aspect of the treatment that was buried in the 17th paragraph of the publish paper but turns out to be absolutely necessary for this phenomenon to appear.

Or . . . hey, I have a good one: The recent nuclear accord with Iran and rapprochement with Russia over ISIS has reduced tension with those two chess-related countries, so this would explain a lack of replication in a future experiment.

P.S. The title of this post refers to the famous “50 shades of gray” paper by Nosek, Spies, and Motyl, who discovered an exciting and statistically significant finding connecting political moderation with perception of shades of gray. As Nosek et al. put it at the time:

The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (p = .01). Our conclusion: political extremists perceive the world in black-and-white, figuratively and literally. . . . Enthused about the result, we identi ed Psychological Science as our fall back journal after we toured the Science, Nature, and PNAS rejection mills. . . .

Just to be safe, though, they decided to do their own preregistered replication:

We ran 1,300 participants, giving us .995 power to detect an effect of the original effect size at alpha = .05.

And, then, the punch line:

The effect vanished (p = .59).

P.P.S. I’m sometimes told it’s not politically savvy to mock. Maybe so. I strongly believe in the division of labor. We each contribute according to our abilities. I can fit statistical models, do statistical theory, and mock. Others can run experiments, come up with psychological theories, and form coalitions. It’s all good. I think mockery is what a lot of this himmicane stuff deserves, but I have a lot of respect for the Uri Simonsohns of the world who keep a straight face and engage personally with the authors of work that they criticize. Indeed, that careful and polite strategy can really work, and I’m glad they are going that route, even if I don’t have the patience to do it that way myself.

If Yogi Berra could see this one, he’d spin in his grave: Regression modeling using a convenience sample

Kelvin Leshabari writes:

We are currently planning to publish some few manuscripts on the outcome of treatment of some selected cancers occuring in children. The current dataset was derived from the natural admission process of those children with cancer found at a selected tertiary cancer centre. To the best of our understanding, our data are based on convenient sampling as we collected what was available. We are also quite sure that those patients were far from being representative of all other children with cancer in our area. We do know that convenient sampling is a non-probability sampling method. We plan to run several analyses that may include fitting proportional hazard models.

At present, I have a conflict with other co-investigators since I do believe that our dataset has violated some assumptions (e.g. proportionality assumption, randomness, independence, etc etc) and hence it may be statistically inappropriate to apply some statistical techniques (e.g. Proportional Hazard models, Kaplan-Meier curves). However, they provided me with a handful of publications that analysed data (based on convenience sampling!) even with obvious signs of violation of key assumptions of those statistical techniques.


1. Is it statistically valid to analyse data using methods that need randomness and independence assumptions in cases of non-probability sampling? (e.g. Is it logical to fit a regression model to a dataset that was derived from convenient sampling?)

2. Can sample size (when infinitely large!) be a rescue for application of proportional hazard models (including fitting Kaplan-Meier curves!) in cases of non-probability sampling?

3. Can I fit proportional hazard models in a dataset whose nature is derived from convenience sampling?

My reply: If you do regression modeling using a convenience sample, you’re implicitly assuming the probability of a person being included in your sample depends only on the variables included in your regression model. To put it another way, you’re assuming equal-probability sampling within the poststratification cells determined by the predictors in your regression. So you can typically say that, the more predictors you have, the more reasonable it is to fit a regression model to a convenience sample. If you have some sense of the biases in the convenience sample (which sorts of people are more or less like to be in the sample), you should try to include predictors in your model that will capture this. And if necessary add a term to your log-likelihood to model any selection processes, but it’s rare that researchers ever get to this step.

I owe it all to my Neanderthal genes


Yesterday I posted a methods-focused item at the Monkey Cage, a follow-up of a post from a couple years ago arguing against some dramatic claims by economists Ashraf and Galor regarding the wealth of nations.

No big deal, just some standard-issue skepticism. But for some reason this one caught fire—maybe somebody important linked to it, maybe it was my click-friendly title, “Why is Africa so poor while Europe and North America are so wealthy?” It got 1323 comments! Usually my posts at the sister blog get about 5 comments and I have no idea if anyone reads them at all. Not that getting comments is my goal, this was just unexpected.

I also got some emails which I’ll share with you without comment.


I disagree with your reasoning of genetic “diversities”…….I believe there is a simpler explanation, that being the Jewish, Oriental & Caucasian races are more intelligent & motivated than other races…….What more examples do you need beyond the last 125 years ? Evidence is EVERYWHERE you look !

Reason #2…… These populations have embraced Capitalism to a larger or lesser degree, with a degree of Socialism thrown in……. I didn’t make it this way…… if you don’t like what I’ve posted here, talk to God about it…….. I only make observations………


This is pertaining to the article you wrote entitled: Why Is Africa So Poor While Europe and North America are So Wealthy?

You seem to imply that Africa would be better off if more whites and Asians went there. That is exactly the problem in the first place. Who do you think is funding the civil wars all over Africa? Multinational companies headed mostly by whites and Asians. Who do you think is supplying weapons to rebels occupying these resource rich areas in Africa? Mostly whites and Asians.

The fact is that America and Europe would not be wealthy if it weren’t for Africa. Didn’t white Americans make Africans pick cotton for them? All the minerals that are in the everyday items that you use, where do you think they come from? What about gold and diamonds? It’s young African children that are digging the minerals that ignorant whites like you are profiting from.

Instead of writing about Africa — which you know nothing about — you should talk about why there are so many white males sleeping on the streets of America and begging for spare change. I’ve never seen an African do that.


I saw your tired article written in the Washington Post. Let me give you a simple answer and a few references for you to read.

White people by nature are thieves and criminals and steal, people, labor and resources. Africa has the best people, the strongest labor and the best resources. Since white people are greedy, lazy and weak of course they steal and make war against those who have the the most. After you destroy the people then you write the history and narrative as you desire to tell it. Like you are doing in your weak newspaper article. It’s time for black people to start to check this foolishness you are peddling. Let’s not go into genetics because it is a proven fact from Margaret Meade, Louis Leakey from Mendel’s law that white people are weak, genetic recessive and inferior to black people in ALL aspects. I will suggest a few books to you that I know you and your students are NOT reading at Columbia and they are: How Europe Underdeveloped Africa by Walter Rodney, Destruction of Black Civilization: Great Issues of a Race from 4500 B.C. to 2000 A.D. by Chancellor Williams and lastly Stolen Legacy by Dr. George G.M. James. Speak facts and not white nationalism and the people will learn the truth. Sit down with that foolishness somewhere.

John -3:11


Why is Europe and USA so rich and Africa so poor ? Its really very simple, Africa is laboring under centuries of self inflicted CURSES of witchcraft upon its people groups.

Having been to Kenya and Tanzania I speak of these matters from on site observations. People in most cases do not want to overturn these curses but continue using them.

If you disagree, help yourself.


Your recent article in the Washington Post titled “Why is Africa so poor while Europe and North America are so wealthy?” reminded me of some important research on that topic. Most refer to the effects of ethnic diversity on growth and public goods provision.

Along with others, you have criticized the Ashraf-Galor study of genetic diversity and economic development. I believe that the correlations observed in that paper are the shadow cast by underlying ethnic diversity effects.

I recommend to you the following peer-reviewed studies of this and adjacent topics. They are prominent authors and their papers provide a wealth of associated studies. I believe that their conclusions are almost universally accepted. PDF copies can be found on

Africa’s Growth Tragedy: Policies and Ethnic Divisions, Easterly+Levine 1997

Ethnic diversity and economic development, Montalvo 2005

Ethnic Diversity and Economic Performance, Alesina+Ferrara 2003

Ethnic Divisions, Trust, and the Size of the Informal Sector, Lassen 2003

Public Goods and Ethnic Divisions, Alesina+Baqir+Easterly 1999

Economic versus Cultural Differences: Forms of Ethnic Diversity and Public Goods Provision, Baldwin+Huber 2010

Ethnic diversity, social sanctions, and public goods in Kenya, Miguel+Gugerty 2005


As a well-traveled journalist in the U.S., having lived for years in each of in five distinct regions of this country, with time spent in all 50 states, I’ve observed differences in regional societal advancement has much to do with individual levels of curiosity and risk-acceptance reflected by those who stayed where born and those who ventured far geographically.

What accounts for those variations, often found among siblings, I don’t know, but when you look at poor, conservative, suspicious and religious rural Southern families you find they’ve stayed-put while more open-minded, inquisitive kinfolk migrated to, say, the Pacific Coast and have been or became far more open to new ideas, to other cultures, to innovation and adventure.

Then think of the lack of advancement among Africans who for generations remained in Africa, satisfied enough with what’s there, to the progress made by those who migrated, beginning eons ago, northward and then across the upper half of the globe. You see a similar pattern differing, I strongly suspect, mainly in scale.

What genetics has to do with all that is beyond my ken, but it doesn’t appear to be a big factor given the variations within families.


Thank you for the well written, highly entertaining article.

The question, “Why is Africa so poor while Europe and North America are so wealthy?”, is very similar to “Why is the culture of Phocoena so impoverished?”

The answer is that economists tend to look at apples, oranges, and tomatoes and see agricultural commodities, the labor, trade and wealth that might be represented, rather than fruit.

The economy of Bolivia is much more wealthy than economists can imagine. The economy of more ‘primitive’ cultures, globally, is being destroyed at rapid pace by the parasitic cultures of North America and Europe, to the ultimate detriment of all. The economy of sub-Saharan Africa is the product of centuries of colonization which has given it the genetic diversity and doomed the dark continent to the same ugly fate which awaits North America, Europe, Asia and Australia. And i would humbly posit that the next great civilization on this planet will arise from either South America or Africa, although my bet would be on Bolivia.

The Phocoenan economy, which was once one of absolute abundance, is in peril. All the commodities necessary to life, food, water, air and space, were at one time freely available to all. What threats existed were to individuals not to the entire culture. Today they face the threat of extinction due to the ignorance and arrogance of a very few people. And yet, to date, not a single economist has considered the wealth of this noble culture.

Economists have determined that the culture of the ancient civilizations of South and Central America failed due to a change of weather. The agricultural portion of the economy could no longer sustain the people of all portions of the economy so they died out or moved, changing from an ag-based economy to one less advanced, leaving behind their structures and art to silently testify to their former glory.

In a hundred years all that shall remain of the economy of North America will be some faces carved into stone, which will look almost nothing like the inhabitants, and some lines across the continent that once were roads. Those people who remain will be hunter/gatherers. Agriculture will survive in South America and Africa, where the soil has not been so contaminated as it is elsewhere and where the people are yet well acquainted with their environment.

The economy of Bolivia will survive the coming cataclysm. The economies of Europe, Asia, North America and Australia will not.

“Why?”, you might ask.

Because all of the “more advanced” economies rely upon technology, infrastructure and trade. The economy of Bolivia does not need America, or any other location. It was isolated and will be self-sustaining when a series of some relatively minor problems bring about riot, panic, anarchy and implosion of the more advanced economies. The poor will eat the rich and not be satisfied. The strong will eat the weak until only the vultures remain. It will only be one very bad season, that will degenerate into a very very bad year for economists and all those who do not actually produce anything but rely on the strength of capital and the largess of the state to sustain them in their leisure.

Such is my opinion and my amusement with economists and statisticians.


I am excited (and inspired!) by your article today in the Washington Post.

I think I know the answer to why both Africa and Bolivia are poor, and why they will continue to remain poorer than their neighbors. My two-decades work on the physics of evolution (and civilization) predicts this.

I have a new book The Physics of Life that summarizes the phenomenon and its prediction. . . .


I read your op-ed piece on the Africa hypothesis for relative poverty. I agree with your analysis and critique. For my undergrad thesis I wrote a historical comparison/contrast between frontier development in Brazil versus the United States. It was clear that a number of extraction and mercantilism practices rendered the Brazilian economy lame for 200 years. Gold alone accounted for a huge disparity. Brazil had more than we did – it all went to Europe. Then agricultural practices, yeoman farmers like Jefferson envisions versus plantations and slavery on a scale that would have made the anti-bellum South blush. It had nothing to do with genetics.

But longer term I have been interested in both statistics and genetic disease. Malaria was the first clue. In Vietnam the military noted that some soldiers were more resistant to malaria. General Stilwell and General Mountbatten in WW2 Asia had to cope with huge casualties from diseases. Latent sickle cell carriers are more resistant. That is not a trait North Europeans have. Later it was supposed that 6-8,000 years ago in the Mediterranean a mutation led to cystic fibrosis, which made latent carriers more resistant to rampant respiratory diseases. So it is a modern genetic “disease” in the sense that maybe one out of 50,000 persons suffer severe symptoms and early death. But all the other latent carriers are more adaptable to new environmental threats.

Recently I have seen sheepish suggestions that autism or Aspbergers is another possible genetic mutation that enhanced survival. It is not diagnosed often in Africa or southern populations with high ancient genetic diversity. That may be because of poverty and poor medical care. But it is also seen that some of the proposed genes that cause autism do not appear to occur at all in African or south Asian populations. As a possible survival mechanism, it may be that in Pleistocene diasporas those genes allowed for two additional traits…. Rapid memory recall of new environments (think a milder form of Rainman), and also a milder form of restless and desire to move from one location to another. There are sedentary populations in Africa and southern regions of Asia where the same people have lived for millenia. Not so in Europe, northern Asia, the Americas. No substantial proof yet, but it fits in the model for other genetic “diseases.”

Then economic success is relative. In the Pleistocene, African populations were sedentary in most places, and they were comfortable in the terms of the day. They did not have to move as all the means to “make a living” were available. Other populations had to move to eat. As no other primate species has this urge it must come from some new mutation. And to “make a living” in new environments technology is necessary. Not so in safe regions of Africa. So a combination of a safe environment and mutations, like lactose adaptation, or skin melanin changes, all contribute to how a small ethny or clan would adapt and “make a living.”

Today we may consider African backward and “poor.” But given a choice in 40,000 BC I think I’d prefer to live in comfortable and well-fed Africa as to say the edge of a great desert, or a ice covered tundra steppe. Technology of course has changed that formula in the intervening millenia. But a perfect more modern example is the difference between say California First Nations, an old population going back to the original migrants around 14k year ago. They had in the Central Valley no recorded conflicts, minimal organization, with very small ethnies and over 200 languages. They remained in situ for ages. They ate well, had to work little. Later Na Dene migrants such as Athabaskans like Navajo, Apache, or Nahuatl moved a lot. The Aztecs liked these tribes as easy captives and slaves and sacrifices. One group had to work little due to a comfortable locations, another had to work hard in deserts, mountains, icy forests and plains. . . .


The one thing people always leave out of why Africa may be so poor is the possibility that they lack long term neanderthal DNA The neanderthal DNA made its way back to Africa around 3,000 years ago whereas it has been a part of Eurasia for tens of thousands of years.

Neanderthal’s created the first art, have the oldest burial/sprit world graves and had a larger brain and larger visual cortex.

The bonobo chimpanzee, our closet relative, shares 99% of our DNA and humans that are not from Africa have a percentage of neanderthal DNA that is equivalent to the differences in chimpanzees to humans and look how different they are.

Perhaps a story on the possibility of neanderthal DNA making the difference would be a good follow up article.

So there you have it. If you don’t like what I’ve posted here, talk to God about it. I only make observations.

On deck this week

Mon: I owe it all to my Neanderthal genes

Tues: If Yogi Berra could see this one, he’d spin in his grave: Regression modeling using a convenience sample

Wed: 64 Shades of Gray: The subtle effect of chessboard images on foreign policy polarization

Thurs: Integrating graphs into your workflow

Fri: Gary Venter’s age-period-cohort decomposition of US male mortality trends

Sat: Some folks like to get away, take a holiday from the neighborhood

Sun: Controlling for variation in the weather in a regression analysis: Joe and Uri should learn about multilevel models and then they could give even better advice

Risk aversion is a two-way street


“Risk aversion” comes up a lot in microeconomics, but I think that it’s too broad a concept to do much for us. In many many cases, it seems to me that, when there is a decision option, either behavior X or behavior not-X can be thought as risk averse, depending on the framing. Thus, when people talk about risk-aversion, they’re not really saying much of anything at all. I link to the famous vase/faces illusion to reinforce this point that, for any given decision, what is risk averse and what is risk seeking can flip back and forth, but once you have one framing in mind, it can be hard to keep the other one in mind.

I’ve written about this several times on the blog but maybe not in the past year or so.

The topic arose today when I read a post by Tyler Cowen on a book by Greg Ip called “Foolproof: Why Safety Can Be Dangerous and How Danger Makes Us Safe,” whose theme is described as follows:

How the very things we create to protect ourselves, like money market funds or anti-lock brakes, end up being the biggest threats to our safety and wellbeing.

I’m sympathetic to this general argument (see for example, sections 5 and 6 of this paper from 1998, the sections on Utility of money and risk aversion and What is the value of a life) and I’m guessing that I’d like Ip’s book.

But one of the examples, excellent in itself, illustrates the disturbing two-way nature of risk aversion. Here’s the example:

By Spellberg’s reckoning, the odds of an adverse reaction to an antibiotic, such as an allergic reaction, are about 1 in 10, whereas the odds that someone will suffer because antibiotics were wrongly withheld are about 1 in 10,000. Nonetheless, most physicians do not want to run the risk of letting a patient suffer when an antibiotic could help . . . His research in Nepal produced the depressing finding that antibiotic resistance was highest in communities with the most doctors.

When I started reading this example, given Cowen’s description of the book, I thought he was going to say that people are too worried about allergic reactions, that allergic reactions are super-rare and people should be less risk-averse, less worried about absolute safety, and instead they should do the rational thing and just take the damn drug.

Actually, though, the message in this example turned out to be the opposite: we are told that taking the drug (or, more specifically, prescribing the drug, as the focus seems to be on the doctor’s decision) is the overly-safety-concerned option that is actually a bad idea.

Either way, the message is clear: evaluate costs and benefits and make an informed decision. So in that sense the framing in terms of risk aversion is irrelevant. (Cowen does not actually use the term “risk aversion” in his discussion of Ip’s book, but that’s the only way I can think of interpreting the above discussion.)

My point is not to claim this researcher is wrong about the antibiotics—I have no idea about that—but rather that risk aversion is such a flexible framework that it can be applied in just about any direction. Using an antibiotic is risk averse because you’re taking the drug just in case. Not using an antibiotic is risk averse because you’re scared of the possible adverse reactions. Vaccinating your kid is risk averse because you’re worried about measles. Not vaccinating your kid is risk averse because you’ve heard it might cause autism. Increasing your savings in the stock market is risk averse because you’re insuring yourself against inflation, or keeping your savings in cash is risk averse because the stock market might crash. Driving is risk averse because you’re scared of flying; flying is risk averse because you’re scared of driving.

This doesn’t work for me.

P.S. This stuff is controversial: the commenters to the above-linked post pretty much uniformly disagree with Cowen on the antibiotics question.

What is the “true prior distribution”? A hard-nosed answer.

The traditional answer is that the prior distribution represents your state of knowledge, that there is no “true” prior. Or, conversely, that the true prior is an expression of your beliefs, so that different statisticians can have different true priors. Or even that any prior is true by definition, in representing a subjective state of mind.

I say No to all that.

I say there is a true prior, and this prior has a frequentist interpretation.

1. The easy case: the prior for an exchangeable set of parameters in a hierarchical model

Let’s start with the easy case: you have a parameter that is replicated many times, the 8 schools or the 3000 counties or whatever. Here, the true prior is the actual population distribution of the underlying parameter, under the “urn” model in which the parameters are drawn from a common distribution. Sure, it’s still a model, but it’s often a reasonable model, in the same sense that a classical (non-hierarchical) regression has a true error distribution.

2. The hard case: the prior for a single parameter in a model (or for the hyperparameters in a hierarchical model)

OK, now for the more difficult problem in which there is a unitary parameter. Or parameter vector, it doesn’t matter, the point is that there’s only one of it, it’s not part of a hierarchical model and there’s no “urn” that it was drawn from.

In this case, we can understand the true prior by thinking of the set of all problems to which your model might be fit. This is a frequentist interpretation and is based on the idea that statistics is the science of defaults. The true prior is the distribution of underlying parameter values, considering all possible problems for which your particular model (including this prior) will be fit.

Here we are thinking of the statistician as a sort of Turing machine that has assumptions built in, takes data, and performs inference. The only decision this statistician makes is which model to fit to which data (or, for any particular model, which data to fit it to).

We’ll never know what the true prior is in this world, but the point is that it exists, and we can think of any prior that we do use as an approximation to this true distribution of parameter values for the class of problems to which this model will be fit.

3. The hardest case: the prior for a single parameter in a model that is only being used once

And now we come to the most challenging setting: a model that is only used once. For example, we’re doing an experiment to measure the speed of light in a vacuum. The prior for the speed of light is the prior for the speed of light; there is no larger set of problems for which this is a single example.

My short answer is: for a model that is only used once, there is no true prior.

But I also have a long answer which is that in many cases we can use a judicious transformation to embed this problem into a larger class of exchangeable inference problems. For example, we consider all the settings where we’re trying to estimate some physical constant from experiment and prior information from the literature. We summarize the literature by N(mu_0, sigma_0) prior. In this case we can think of the inputs to the inference as being mu_0, sigma_0, and the experimental data, in which case the repeated parameter is the prediction error. And, indeed, that is typically how we think of such measurement problems.

For another example, what’s our prior probability that Hillary Clinton will be elected president in November. We can put together what information we have, fit a model, and get a predictive probability. Or even just use the published betting odds, but in either case we are thinking of this election as one of a set of examples for which we would be making such predictions.

What does this do for us?

OK, fine, you might say. But so what? What is gained by thinking of a “true prior” instead of considering each user’s prior as a subjective choice?

I see two benefits. First, the link to frequentist statistics. I see value in the principle of understanding statistical methods through their average properties, and I think the approach described above is the way to bring Bayesian methods into the fold. It’s unreasonable in general to expect a procedure to give the right answer conditional on the true unknown value of the parameter, but it does seem reasonable to try to get the right answer when averaging over the problems to which the model will be fit.

Second, I like the connection to hierarchical models, because in many settings we can think about a parameter of interest as being part of a batch, as in the examples we’ve been talking about recently, of modeling all the forking paths at once. In which case the true prior is the distribution of all these underlying effects.

Stochastic natural-gradient EP

Yee Whye Teh sends along this paper with Leonard Hasenclever, Thibaut Lienart, Sebastian Vollmer, Stefan Webb, Balaji Lakshminarayanan, and Charles Blundell. I haven’t read it in detail but they not similarities to our “expectation propagation as a way of life” paper. But their work is much more advanced than ours.