Several studies have been performed in the last few years looking at the economic decisions of parents of sons, as compared to parents of daughters. For example, Tyler Cowen links to a report of a study by Andrew Oswald and Nattavudh Powdthavee that “provides evidence that daughters make people more left wing. Having sons, by contrast, makes them more right wing”:
Professor Oswald and Dr Powdthavee drew their data from the British Household Panel Survey, which has monitored 10,000 adults in 5,500 households each year since 1991 and is regarded as an accurate tracker of social and economic change. Among parents with two children who voted for the Left (Labour or Lib Dem), the mean number of daughters was higher than the mean number of sons. The same applied to parents with three or four children. Of those parents with three sons and no daughters, 67 per cent voted Left. In households with three daughters and no sons, the figure was 77 per cent.
I’ve seen some other studies recently with similar findings–a few years ago, a couple of economists found that having daughters, as compared to sons, was associated with the probability of divorce, I think it was, and recently a study by Ebonya Washington found that for Congressmembers, those with daughters (as compared to sons) were more likely to have liberal voting records on women’s issues.
Controlling for the number of children: an intermediate outcome
A common feature of all these studies is that they control for the total number of children. This can be seen in the quote above, for example: they compare different sorts of families with 2 kids, then make a separate comparison of different sorts of families with 3 kids.
At first sight, controlling for the total number of children seems reasonable. There is a difficulty, however, in that the total number of kids is an intermediate outcome, and controlling for it (whether by subsetting the data based on #kids or using #kids as a control variable in a regression model) can bias the estimate of the causal effect of having a son (or daughter).
To see this, suppose (hypothetically) that politically conservative parents are more likely to want sons, and if they have two daughters, they are (hypothetically) more likely to try for a third kid. In comparison, liberals are more likely to stop at two daughters. In this case, if you look at data on families with 2 daughters, the conservatives will be underrepresented, and the data could show a correlation of daughters with political liberalism–even if having the daughters has no effect at all!
A solution is to apply the standard conservative (in the statistical sense!) approach to causal inference, which is to regress on your treatment variable (sex of kid) but controlling only for things that happen before the kid is born. For example, one could compare parents whose first child is a girl to parents whose first child is a boy. One can also look at the second birth, comparing parents whose second child is a girl to those whose second child is a boy–controlling for the sex of the first child. And so on for third child, etc.
The modeling could get interesting here, since there is a sort of pyramid of coefficients (one for the first-kid model, two for the second-kid model (controlling for first kid), and so forth). It might be reasonable to expect coefficients to gradually decline (I assume the effect of the first kid would be the biggest), and one could estimate that with some sort of hierarchical model.
I’m not saying that all these researchers are wrong; merely that, by controlling for an intermediate outcome, they’re subject to a potential bias. Also they could redo their analyses without much effort, I think, to fix the biases and address this concern. I hope they do so (and inform me of their results).
It’s an interesting example because we all know not to control for intermediate outcomes, but the total # of kids somehow doesn’t look like that, at first.
See here for more discussion of the U.K. voting example.