## Who has babies when?

Sheril Kirshenbaum links to this graph from economists Kasey Buckles and Daniel Hungerman showing differences in who conceives babies in the fall (older, better-educated people) and the spring (younger, less well-educated people):

Pretty stunning. And a nice graph. The repeating pattern over the years is super-clear. I’d also like to see a version that just shows the averages for the 12 months, so I could see the pattern in more detail. Also I’d like to subtract 40 weeks so it shows the data by (approximate) month/date of conception.

P.S. This news article by Justin Lahart is excellent. But I did notice one funny thing (to a statistician):

The two economists examined birth-certificate data from the Centers for Disease Control and Prevention for 52 million children born between 1989 and 2001 . . . 13.2% of January births were to teen mothers, compared with 12% in May–a small but statistically significant difference, they say.

Well, yeah, with n=52,000,000, I’d think that a 1 percentage point difference would be statistically significant! More seriously, with that many cases, it sounds like the next step (if the researcher haven’t already done this) is to break things down by subgroups of the population. I wonder what data are available from the birth certificate records. To start with, there’s geographic information.

1. Kevin Wright says:

To my eye these graphs look too nice. Not enough noise in the data. Makes me wonder if I'm looking at data or looking at the fitted model.

2. marcel says:

"Also I'd like to subtract 40 weeks so it shows the data by (approximate) month/date of conception."

Aren't teenagers (and low income mothers) more like likely (in a frequentist sense) to have premature births? If so, using the same 40 weeks for all mothers would distort the data.

3. Aaron K. says:

"I'd like to subtract 40 weeks so it shows the data by (approximate) month/date of conception."

I don't think this quite works, because I have it in the back of my mind that premature births are more common in low-income families.

4. xi'an says:

I find those graphs more than intriguing because of (a) the absolute correlation between the three curves (change the scales, use the complement percentage for the second graph and they are the very same!) and (b) the strong trend over a five year period, which does not seem believable. Going from 14% in '98 to 12% in '01 for teenage pregnancies?! All this sounds suspicious…

5. Carlos Scheidegger says:

Now we need Australian data to see if it is phase shifted by 6 months!

6. Kaiser says:

Good example to show that sampling error/significance is meaningless when you have data on the entire population

7. William Ockham says:

Am I the only one interested in the secondary bump that occurs in what looks like August?

8. wei says:

i thought the second bump is Oct, which is quite common among people I know. And I believe the reason is that holidays give more time for conception

9. Andrew Gelman says:

Kevin: I know what you're saying, but look at the graph on the left. That doesn't look like the product of a model.

Marcel, Aaron: Sure, but, given that we have birth data, I think the 40-week correction would be a start. The data also seem to have age of the mother, so you could subtract different numbers of weeks for different ages.

Christian: I agree that the trends are pretty strong; I just don't know enough about this area to know if they're a suprise at all.

Kaiser: I don't quite agree with you. If you had a much smaller data set (for example, just a single age cohort in a single county), then sampling variation would come into play. Even with the entire population, there's still randomness in people's decisions and also of course randomness in whether people get pregnant.

10. Kaiser says:

Andrew: I think we can agree in this way… if they wanted to generalize from the 1989-2001 cohorts to children of any cohort, then a margin of error does have some meaning but for this purpose, using all 52 million children is way overkill and as you pointed out, makes margin of error or testing meaningless. I'd also be worried about sampling bias… the margin of error will capture people's randomness but what about the clear trends shown in the graphs?

On the other hand, if they think they are generalizing from their "sample" to the entire 1989-2001 cohorts, then no such generalizing is occurring. they have the entire population.

11. Kaiser says:

On a related topic. Sampling variation strictly speaking is the variation between different samples drawn from the same underlying population. In real life, the population is changing over time. We make inference based on the sample from the current population; the margin of error tells us the variabiltiy across samples drawn from this population. Yet, we won't draw any further samples from this population. We draw it from next month's population. Oftentimes, we then assume that the margin of error computed last time will deal with the variability from "sample" to "sample", which here includes variability from "population" to "population". How valid is this? Is there literature on this?

12. alex says:

I wonder about the base rates. I think that might be important because the populations driving the ratios aren't going to be stable – they'll vary because of seasonality and rounding effects.

Take the % births to teens graph. The numerator of the % is going to be driven by the population of sexual active teens below 19 1/4. We know there's an autumnal high in births, so this population will vary seasonally as the cut off between 19 & 20 moves through the year. It'll be higher in Jan than May. I don't think that seasonal variation will have to very big to produce the 1% difference between highs and lows we're seeing.

Related to this concern I'm also not too sure about the article, he doesn't seem to be keeping things like p(Young Mother|Birth)>p(Old Mother|Birth) and p(Birth|Young Mother)>p(Birth|Old Mother) straight.

13. Megan Pledger says:

I'd really like to see the time series separated out for each group e.g. young people/ old people.

All these results might be showing is
older, more educated people get married when they decide the time time is right to have kids, follow tradition and get married in May/June and then have a child 10 months later.

Teens might be giving birth uniformly across the year for all we know.

Although, as a non-American, it would be interesting to know if there is a life changing event in March for teens i.e. are they chosing parenthood/not chosing contraception because some other part of their future has closed down e.g. find out that they have to do summer school/not move up a grade/college acceptance letters come out.

14. C. Zorn says:

Actually, the 40-month offset may not be so bad after all. Lower-SES mothers do tend to have higher rates of preterm births, but so do older women, who are often also more educated, higher SES, etc.

Ah, my kingdom for a half-dozen covariates…

15. Lord says:

My first thought, the scientific basis for astrology?

16. ZBicyclist says:

Once again, simple math defeats me. If we go 9 months back from May, don't we get to the previous August? August isn't fall.

17. Robin Goodfellow says:

Keep in mind that these graphs do not show isolated data but mixed data. The values are "percentage of women giving birth each month who are X". Which means that much of the variation of the 1st graph can be explained by the second, because more teenagers giving birth in January will reduce the percentage of married women giving birth even if they are responsible for exactly the same number of births. More specifically, the graphs do not present orthogonal data, each graph includes a little bit of the same data in each other graph.

Also note how misleading the graphs are, they are normalized to the total number of births per month. Such a presentation can yield very skewed views of birth rates. For example, if 90 married women and 10 unmarried teenagers gave birth in May while 900 married women and 50 unmarried teenagers gave birth in January that would produce similar data to these graphs, because the overall percentage of married women giving birth would go down from May to January and the percentage of teens giving birth would go up. Though interpreting that data to say that teens were more likely to give birth in May would be a gross mis-characterization of the facts.

Given these graphs, a perfectly valid interpretation may be that older women generally give birth at a more or less equal rate across all months, while women under 25 (which includes teenagers and is more likely to include unmarried and less educated women) give birth slightly more often (on the order of 10%) in January than in May. That scenario alone is sufficient to describe these 3 graphs.

In short, I think this line of analysis is highly dubious and of little value.

18. William Ockham says:

Robin,

The number of births per month is easily available (all years going back to 1997 are here)*. There is a very regular pattern with January being one of the months with fewest number of births and May being a very typical month (the May annualized birth rate is generally almost the same as that year's birth rate). The variation is not very large. I looked at 1997 through 2001. May has about 3% more births than January. For example, in 1999 there were 319,182 births in January and 328,526 births in May.

The only demographic breakdown they do by month is white/black (Hispanics are not broken out). There is a real divergence there. There are more births to black mothers in January than May.

I have no theories or explanations for any of this. I just wanted to point out that you could test your theory if you want.

*Just scan down the page for "Births: Final Data for" and the year you are interested in. Once you load the pdf, the breakdown by month is always table 15.

19. Megan Pledger says:

One of the interesting things is that people who are born in January are overly represented in professional hockey.
(http://www.goaliestore.com/board/hockey-talk/80588-why-most-pro-hockey-players-born-january.html)

It's usually attributed to their physical advantages over their younger peers as age grade players but perhaps their parental characteristics give them some advantage too.

20. Robin Goodfellow says:

@William,

I'm aware the data exists elsewhere, it was more that I was criticizing the way the study presented the data (or lack thereof). My main point is that three graphs are displayed but they present essentially the same data, merely different aspects of it, and they hide equally important data.

In fact, the graphs above are wholly consistent with the hypothesis that teenagers give birth at an equal rate per month while older women give birth more often in May. There are many, many other contradictory interpretations of the data which fit these presentations, which is a certain sign that the presentations are faulty.

This presentation of the data is unhelpful for analysis and easily abused. There are already many people in the comments here who have misinterpreted the data due to the way it has been presented.

21. Jerzy says:

High school prom is usually around February-March-April. Kids are pressured by social awareness (almost an expectation) that this is a night when "everybody's doing it," so that even kids who didn't expect to have sex yet might give in to it after all — and presto, there you have your surge in uneducated, unmarried, teen mothers giving birth 9 months later around January.

22. Daniel says:

I have been following this whole article for a while too. I am rather facinated by the whole concept and I have been trying to determine what would cause the increase in low-income births in january (relitivly)

I think Jerzy has an interesting point but I would like to here other people's opinion as well.

Why would low income couples be more like then higher income couple to conceive around the months of may?