I’m scheduling these posts a few months ahead of time, and I realize this is the perfect date for an update on the birthday model. Can we fit in Stan yet?

As of this writing, I don’t know. But Aki and Seth assure me that we’re close . . .

P.S. Happy 13th birthday, Craig!

What about the birthday model makes it impossible to fit in stan?

I too would be interested to hear what the challenges are for the B-day model in Stan. Even better, if you can fit it now, would be the Stan code and the description of what tuning etc was needed to fit it.

Brad, Daniel:

If you write the birthday model directly, you’ll have a 20,000 x 20,000 covariance matrix that needs to be recomputed at each step, as it depends on unknown hyperparameters.

How was the birthday model fit for the GP chapter in BDA3?

Noah:

Aki fit the model using GPStuff.

So how does GPstuff deal with the huge covariance matrix?

Expectation propagation, I think?

http://research.cs.aalto.fi/pml/software/gpstuff/demo_births.shtml

And, they give the URLs for the extracted data at that link. thanks Corey.

Is this because you have ~ 50 years of data (20,000 days?)

Is the model seriously compromised by asserting a periodic structure of 366 days so that you have only a 366×366 covariance matrix and 50 observations at each day?

Or am I missing something else?

Daniel:

Sure, there are ways of attacking the problem, making various shortcuts, etc. But I’d really like to be able to set up the model directly in Stan and not have to think about it.

Sure I see that. But I’m trying to figure out the difficulty, and it sounds like it’s the 20,000 days. Is the data online for someone to attack in Stan? I am up for a half day of attacking the model at the 366 day periodic level and seeing how much it differs, as a way to understand limitations of GPs better.

– The data used in BDA3 has 7305 days

– It’s publicly available and couple times linked in this blog, too (and the data source is mentioned also in BDA3)

– Data source: National Vital Statistics System natality data, as provided by Google BigQuery and exported to cvs by Chris Mulligan (sum data http://chmullig.com/wp-content/uploads/2012/06/births.csv) and Robert Kern (whole time series http://www.mechanicalkern.com/static/birthdates-1968-1988.csv)

– It’s not impossible to fit in Stan, but would require more memory in the computer and wall clock time than most of us have (ok, it’s impossible for most of us)

– Bob has told me that currently the problem is that with 7305×7305 covariance matrix which depends on 18 parameters the expression tree is very large requiring lot of memory and time to construct (I got out of memory error with much smaller data sets)

– GPstuff does not have autodiff, but has hand written gradient computations, which take less memory and time

– Bob has told that Stan can be made much faster by implementing built-in covariance functions as then the autodiff expression tree can be made much smaller

– I think Daniel Lee is working on built-in covariance functions

– I’m certain we’ll post as soon as we have a nicely working Birthday GP in Stan

I just grabbed the data, if I can get around to it in the next few days, I’ll attack it as 20 vectors of observations using 365×365 covariance after transformation of the data into births per fertile female, which should remove one of the major sources of nonstationarity. Will blog it for comparison with the BDA3 cover.

You can see my 365 model (without “per fertile female”) here

http://andrewgelman.com/2012/06/14/cool-ass-signal-processing-using-gaussian-processes/

and you can follow the links to see previous analysis by others.

Thanks, It’s been a while now, I’d forgotten how much had already been done. Can you explain for me the “periodic with decay” covariance function? what is the mathematical form?

For periodic with decay, see GPML book Eq (5.16) http://www.gaussianprocess.org/gpml/chapters/

Looks like a very interesting book (ABC based pictorial intro to Bayes in 2006!)

(I am biased though as I used to sit in on Hinton’s Neural Networks lab in the mid-1990s)

I’ve browsed this book before, in fact I think I even archived the whole thing on my home directory to browse offline. It’s pretty slick.

Choosing covariance functions is tricky, it’s not trivial to “design” them other than by certain combinations of known functions (sums, products, etc), and non-experts wind up working with a “bestiary” of functions. At least that’s my experience. It seems like the periodic with decay is basically the product of a quadratic exponential kernel and a periodic kernel, which makes sense.

Aki: if you model a 20 year timeseries as 20 windowed samples from a gaussian process with common covariance function, there’s really no need to use periodic kernels at the 365 day level right? I mean periodic kernel at week scale makes sense, but the seasonal scale could easily be simply a quadratic exponential with an O(90) day timescale right? Especially with something like births where there’s nothing like newton’s laws forcing it to be continuous (and in fact, new-years day is clearly a discontinuity) putting too much periodic structure at the year level seems like it could be problematic unless you’re modeling the whole timeseries as a single sample from a very long timeseries GP where the pseudo-periodicity would be apparent due to the lack of windowing.

I see the links to the data – what is the model being proposed here?

Bozo:

The fitted model is shown on the cover of BDA3 and the model is described within.

The nested reply was maxed out.

Daniel> Aki: if you model a 20 year timeseries as 20 windowed samples

Daniel> from a gaussian process with common covariance function,

Daniel> there’s really no need to use periodic kernels at the 365 day level

Daniel> right? I mean periodic kernel at week scale makes sense, but the

Daniel> seasonal scale could easily be simply a quadratic exponential with an

Daniel> O(90) day timescale right? Especially with something like births where

Daniel> there’s nothing like newton’s laws forcing it to be continuous (and in

Daniel> fact, new-years day is clearly a discontinuity) putting too much

Daniel> periodic structure at the year level seems like it could be

Daniel> problematic unless you’re modeling the whole timeseries as a single

Daniel> sample from a very long timeseries GP where the pseudo-periodicity

Daniel> would be apparent due to the lack of windowing.

I don’t completely understand your question, but the quasi-periodic component with periodicity of one year is supported by the data. If the epxonentiated quadratic would be sufficient without seasonal effect then the estimated parameters would have shown that.

I guess my main point is that if you window a pseudo-periodic timeseries at its pseudo-period then you might not see it as pseudo-periodic within the window…

now that I think about it, provided you’re explaining some of the discontinuities by other short-time-scale variation, I guess the slow / seasonal trend really should be periodic