On the cover of BDA3 is a Bayesian decomposition of the time series of birthdays in the U.S. over a 20-year period. We modeled the data as a sum of Gaussian processes and fit it using GPstuff.
We still can’t just pop these models into Stan—the matrix calculations are too slow—but we’re working on it, and I am confident that we’ll be able to fit the models in Stan some day. In the meantime I have some thoughts about how to improve the model and we plan to continue working on this.
I prefer our Bayesian analysis for various reasons, but Davies does demonstrate the point that hypothesis testing, if used carefully, can be used to attack this sort of estimation problem.
The birthday data used in BDA3 come from National Vital Statistics System natality data, as provided by Google BigQuery and exported to csv by Robert Kern.
More recent data exported by Fivethirtyeight are available here:
The file US_births_1994-2003_CDC_NCHS.csv contains U.S. births data for the years 1994 to 2003, as provided by the Centers for Disease Control and Prevention’s National Center for Health Statistics.
US_births_2000-2014_SSA.csv contains U.S. births data for the years 2000 to 2014, as provided by the Social Security Administration.
NCHS and SSA data have some difference in numbers in the overlapping years, as we discussed here.