I was asked by a reporter to comment on a paper by Satoshi Kanazawa, “Beautiful parents have more daughters,” which is scheduled to appear in the Journal of Theoretical Biology.
As I have already discussed, Kanazawa’s earlier papers (“Engineers have more sons, nurses have more daughters,” “Violent men have more sons,” and so on) had a serious methodological problem in that they controlled for an intermediate outcome (total number of children). But the new paper fixes this problem by looking only at first children (see the footnote on page 7).
Unfortunately, the new paper still has some problems. Physical attractiveness (as judged by the survey interviewers) is measured on a five-point scale, from “very unattractive” to “very attractive.” The main result (from the bottom of page 8 ) is that 44% of the children of surveyed parents in category 5 (“very attractive”) are boys, as compared to 52% of children born to parents from the other four attractiveness categories. With a sample size of about 3000, this difference is statistically significant (2.44 standard errors away from zero). I can’t confirm this calculation because the paper doesn’t give the actual counts, but I’ll assume it was done correctly.
Choice of comparisons
Not to be picky on this, though, but it seems somewhat arbitrary to pick out category 5 and compare it to 1-4. Why not compare 4 and 5 (“attractive” or “very attractive”) to 1-3? Even more natural (from my perspective) would be to run a regression of proportion boys on attractiveness. Using the data in Figure 1 of the paper:
> attractiveness <- c (1, 2, 3, 4, 5)
> percent.boys <- c (50, 56, 50, 53, 44)
> display (lm (percent.boys ~ attractiveness))
lm(formula = percent.boys ~ attractiveness)
(Intercept) 55.10 4.56
attractiveness -1.50 1.37
n = 5, k = 2
residual sd = 4.35, R-Squared = 0.28
So, having a boy child is negatively correlated with attractiveness, but this is not statistically significant. (Weighting by the approximate number of parents in each category, from Figure 2, does not change this result.) It would not be surprising to see a correlation of this magnitude, even if the sex of the child were purely random.
But what about the comparison of category 5 with categories 1-4? Well, again, this is one of many comparisons that could have been made. I see no reason from the theory of sex ratios (admittedly, an area on which I am no expert) to pick out this particular comparison. Given the many comparisons that could be done, it is not such a surprise that one of them is statistically significant at the 5% level.
I have little to say about the difficulties of measuring attractiveness except that, according to the paper, interviewers in the survey seem to have assessed the attractiveness of each participant three times over a period of several years. I would recommend using the average of these three judgments as a combined attractiveness measure. General advice is that if there is an effect, it should show up more clearly if the x-variable is measured more precisely. I don’t see a good reason to use just one of the three measures.
Reporting of results
The difference ireported in this study was 44% compared to 52%–you could say that the most attractive parents in the study were 8 percentage points more likely than the others to have girls. Or you could say that they were .08/.52=15% more likely to have girls. But on page 9 of the paper, it says, “very attractive respondents are about 26% less likely to have a son as the first child.” This crept up to 36% in this news article, which was cited by Stephen Dubner on the Freakonomics blog.
Where did the 26% come from? Kanazawa appears to have run a logistic regression of sex of child on an indicator for whether the parent was judged to be very attractive. The logistic regression coefficient was -0.31. Since the probabilities are near 0.5, the right way to interpret the coefficient is to divide it by 4: -0.31/4=-0.08, thus an effect of 8 percentage points (which is what we saw above). For some reason, Kanazawa exponentiated the coefficient: exp(-0.31)=0.74, then took 0.74-1=-0.26 to get a result of 26%. That calculation is inappropriate (unless there is something I’m misunderstanding here). But, of course, once it slipped past the author and the journal’s reviewers, it would be hard for a reporter to pick up on it.
Coauthors have an incentive to catch mistakes
I’m disappointed that Kanazawa couldn’t find a statistician in the Interdisciplinary Institute of Management where he works who could have checked his numbers (and also advised him against the bar graph display in his Figure 1, as well as advised him about multiple hypothesis testing). Just to be clear on this: we all make mistakes, I’m not trying to pick on Kanazawa. I think we can all do better by checking our results with others. Maybe the peer reviewers for the Journal of Theoretical Biology should’ve caught these mistakes, but in my experience there’s no substitute for adding someone on as a coauthor, who then has a real incentive to catch mistakes.
Kanazawa is looking at some interesting things, and it’s certainly possible that the effects he’s finding are real (in the sense of generalizing to the larger population). But the results could also be reasonably explained by chance. I think a proper reporting of Kanazawa’s findings would be that they are interesting, and compatible with his biological theories, but not statistically confirmed.
My point in discussing this article is not to be a party pooper or to set myself up as some sort of statistical policeman or to discourage innovative work. Having had this example brought to my attention, I was curious enough to follow it up, and then I wanted to share my newfound understanding with others. Also, this is a great example of multiple hypothesis testing for a statistics class.