23 thoughts on “Fitting the birthday model in Stan

  1. – The data used in BDA3 has 7305 days
    – It’s publicly available and couple times linked in this blog, too (and the data source is mentioned also in BDA3)
    – Data source: National Vital Statistics System natality data, as provided by Google BigQuery and exported to cvs by Chris Mulligan (sum data http://chmullig.com/wp-content/uploads/2012/06/births.csv) and Robert Kern (whole time series http://www.mechanicalkern.com/static/birthdates-1968-1988.csv)
    – It’s not impossible to fit in Stan, but would require more memory in the computer and wall clock time than most of us have (ok, it’s impossible for most of us)
    – Bob has told me that currently the problem is that with 7305×7305 covariance matrix which depends on 18 parameters the expression tree is very large requiring lot of memory and time to construct (I got out of memory error with much smaller data sets)
    – GPstuff does not have autodiff, but has hand written gradient computations, which take less memory and time
    – Bob has told that Stan can be made much faster by implementing built-in covariance functions as then the autodiff expression tree can be made much smaller
    – I think Daniel Lee is working on built-in covariance functions
    – I’m certain we’ll post as soon as we have a nicely working Birthday GP in Stan

    • I just grabbed the data, if I can get around to it in the next few days, I’ll attack it as 20 vectors of observations using 365×365 covariance after transformation of the data into births per fertile female, which should remove one of the major sources of nonstationarity. Will blog it for comparison with the BDA3 cover.

        • Thanks, It’s been a while now, I’d forgotten how much had already been done. Can you explain for me the “periodic with decay” covariance function? what is the mathematical form?

        • Looks like a very interesting book (ABC based pictorial intro to Bayes in 2006!)

          (I am biased though as I used to sit in on Hinton’s Neural Networks lab in the mid-1990s)

        • I’ve browsed this book before, in fact I think I even archived the whole thing on my home directory to browse offline. It’s pretty slick.

          Choosing covariance functions is tricky, it’s not trivial to “design” them other than by certain combinations of known functions (sums, products, etc), and non-experts wind up working with a “bestiary” of functions. At least that’s my experience. It seems like the periodic with decay is basically the product of a quadratic exponential kernel and a periodic kernel, which makes sense.

          Aki: if you model a 20 year timeseries as 20 windowed samples from a gaussian process with common covariance function, there’s really no need to use periodic kernels at the 365 day level right? I mean periodic kernel at week scale makes sense, but the seasonal scale could easily be simply a quadratic exponential with an O(90) day timescale right? Especially with something like births where there’s nothing like newton’s laws forcing it to be continuous (and in fact, new-years day is clearly a discontinuity) putting too much periodic structure at the year level seems like it could be problematic unless you’re modeling the whole timeseries as a single sample from a very long timeseries GP where the pseudo-periodicity would be apparent due to the lack of windowing.

  2. The nested reply was maxed out.

    Daniel> Aki: if you model a 20 year timeseries as 20 windowed samples
    Daniel> from a gaussian process with common covariance function,
    Daniel> there’s really no need to use periodic kernels at the 365 day level
    Daniel> right? I mean periodic kernel at week scale makes sense, but the
    Daniel> seasonal scale could easily be simply a quadratic exponential with an
    Daniel> O(90) day timescale right? Especially with something like births where
    Daniel> there’s nothing like newton’s laws forcing it to be continuous (and in
    Daniel> fact, new-years day is clearly a discontinuity) putting too much
    Daniel> periodic structure at the year level seems like it could be
    Daniel> problematic unless you’re modeling the whole timeseries as a single
    Daniel> sample from a very long timeseries GP where the pseudo-periodicity
    Daniel> would be apparent due to the lack of windowing.

    I don’t completely understand your question, but the quasi-periodic component with periodicity of one year is supported by the data. If the epxonentiated quadratic would be sufficient without seasonal effect then the estimated parameters would have shown that.

    • I guess my main point is that if you window a pseudo-periodic timeseries at its pseudo-period then you might not see it as pseudo-periodic within the window…

      now that I think about it, provided you’re explaining some of the discontinuities by other short-time-scale variation, I guess the slow / seasonal trend really should be periodic

  3. I tried reading through this but was left unclear if:

    A) Anyone ever succeeded in implementing either the full model or a reduced model in Stan
    B) Anyone posted STAN code of their attempts

    Could anyone clear that up for me?

Leave a Reply to Aki Vehtari Cancel reply

Your email address will not be published. Required fields are marked *