I’m sorry to see that the Journal of Theoretical Biology published an article with the fallacy of controlling for an intermediate outcome. Or maybe I’m happy to see it because it’s a great example for my classes. The paper is called “Engineers have more sons, nurses have more daughters” (by Satoshi Kanazawaa and Griet Vandermassen) and the title surprised me, because in my acquaintance with such data, I’ve seen very little evidence of sex ratios at birth varying much at all. The paper presents a regression in which:
- the units are families
- the outcome is the number of sons in the family (they do another regression where the outcome is the number of dauhters)
- the predictor of interests are indicators for whether the parent’s occupation is “systematizing” (e.g., engineering) or “empathazing” (e.g., nursing)
- the regression also controls for parent’s education, income, age, age at first marriage, ethnicity, marital status, and number of daughters (for the model predicting the # of sons) or number of sons (for the model predicting # of daughters).
The coefficients of “systematizers” and “empathizers” are statistically significant and large (i.e., “practically significant”) and the authors conclude that systematizers are much more likely to have boys and empathizers are much more likely to have girls, and then back this up with some serious biodetermistic theorizing (e.g., referring to occupations as “brain types”!).
But, but, but . . . their results cannot be interpreted in this way!
The problem is that their analysis controls for intermediate outcomes (a problem which I noted in another study of sons and daughters, although in that case the fallacy had only minor effects on the results). Controlling for total #kids of the other sex is a huge problem since different people may very well go for another kid or not after having one son, or one daughter, or whatever. In addition, income is an intermediate outcome (to the extent we think of occupation as a “treatment”) so it does not make sense to compare two people with different occupation-classes and the same income. Also, occupation itself is intermediate, in that one might, for example, decide to become a social worker after having a girl. Divorce is another intermediate outcome (as has in fact been studied).
The funny thing is that the authors almost . . . almost . . . saw the problem, when they wrote:
when we control for all the variables included in our equations, those who have more biological daughters have fewer biological sons, and those who have more biological sons have fewer biological daughters. This seems to suggest that parents specialize in producing children of one sex or the other, some producing mostly or exclusive boys, and others producing mostly or exclusively girls.
Well, yeah. No biological explanation needed here: just think (for simplicity) of what the data would look like if every couple had 1 kid, or 2 kids. The actual data are a mixture of different numbers of kids, and there’s no reason ahead of time to expect these regression coefficients to be zero–even if (as is roughly the case), the sex of babies born is completely random.
I’m sorry to see that this slipped by peer review, but I’m really sorry to see that it got slashdotted.
P.S. As a bonus point, the paper presents regression results to four significant digits (e.g., 03498 +/- .1326).
P.P.S. Juicy sociobiological quote: “Thus, it pays parents in good conditions to bet on male rather than female offspring.” Perhaps they should have done a study in Las Vegas and checked out the actual betting odds…
P.P.P.S. OK, I’m sorry for making fun of the study. I’ve proven a false theorem myself (well, I guess “proved” isn’t the right word), glass houses and all that.
P.P.P.P.S. Just to bring this to a close: the authors make “the following novel prediction: individuals who have Type S brains should have more sons than daughters, while individuals who have Type E brains should have more daughters than sons” [italics in the original]. But they never check this (as far as I can see)! They just run regressions.
This is truly an example of the tyranny of statistical methodology: if only these biologists (well, actually one of them is at an Institute of Management and another is at a Center for Gender Studies) had calculated some simple averages, they might have learned something, but through fancy regression they were led astray.