From Chris Mulligan:
The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)!
And now, the background
A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days:
What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story.
I was pointed to some tables:
and a graph from Matt Stiles:
The heatmap is cute but I wanted to see the whole year’s pattern, not broken down by month, and I wanted a graph that showed quantitative patterns. Chris Mulligan’s graph (see top of this blog) was much better from my perspective.
Other comments on Chris’s graph
- As Chris noted, Valentine’s Day and Halloween do show up but just barely.
- You can also see dips around the Labor Day and Thanksgiving weekends which are spread a bit because the dates vary for these day-of-week holidays.
- I’d consider rescaling the y-axis so the red line=100, then it would be easier for me to get a grip on the scale of the variation.
- I don’t get anything out of the lowess line but it was a clever way for Chris to pull out some extreme dates automatically. (It was my idea to multiplying the 29 Feb counts by 4.)
- The data could be cleaned even further. Here’s how I’d start: go back to the data for all the years and fit a regression with day-of-week indicators (Monday, Tuesday, etc), then take the residuals from that regression and pipe them back into Chris’s program to make a cleaned-up graph. It’s well known that births are less frequent on the weekends, and unless your data happen to be an exact 28-year period, you’ll get imbalance, which I’m guessing is driving a lot of the zigzagging in the graph above.
- The next step would be to go back to some of the questions raised in recent years by economists who have noticed different patterns of birthdays (and thus of conceptions) as a function of age and education levels of parents. I don’t know what’s been done on this since 2009.
- It might be cute to to display the graph in a circle, to connect 31 Dec – 1 Jan, but I don’t recommend it, as this would come close to destroying our ability to see the annual pattern in the data.
The moral of the story
- The direct time-series graph showed patterns clearly, allowing us to make qualitative and quantitative comparisons much better than were possible using the cute heat map or the tables.
- High-resolution graphics can make a difference, even for a problem as simple as displaying a sequence of 366 numbers.