He’s looking for Bayesian time-series examples

Maurits Van Wagenberg writes:

Coming from the traditional side, started to use Bayes, quickly limiting it to models with less variables, notwithstanding the lure. Am not in academics but have for many years researched design processes of complex objects such as engineering complex process plants. These processes have a lead-time from 12 to 18 months.

Aim was to check on development of variables that could indicate derailment of process. Felt comfortable in using posterior to update new prior, a week later, especially two months into the process.

This winter, was asked to look into a new group of design projects where my concept failed. My previous body of knowledge was limited as were empirical data at hand.

Started to look at your approach (in your Bayesian Data Analysis 3rd and presentations, incl. your French presentations).

My question is: could I find more time-series related examples?

Any suggestions? Our forthcoming Bayesian Econometrics in Stan book should have a few such examples, although this person’s application area seems a bit different. I know that some of you out there work on engineering problems so maybe you have some thoughts for him.

22 thoughts on “He’s looking for Bayesian time-series examples

  1. I’m in the process of writing a series on my blog about Bayesian timeseries analysis – the intended audience is programmer types looking to do more data analysis. First post is up:

    https://www.chrisstucchio.com/blog/2016/has_your_conversion_rate_changed.html

    Second post about analyzing stock prices with MCMC is about half done, hopefully will go up next week.

    Also have a related post which is just a one-off, not strictly part of the series.

    https://www.chrisstucchio.com/blog/2016/bayesian_calibration_of_mobile_phone_compass.html

    Maybe this is useful? It’s hard to tell exactly what he’s looking for.

  2. Can think of a few examples:

    – The Bayesian VAR literature – see Litterman’s work in the 80s including the Minnesota Prior specification for p order lags.

    – Bayesian ARIMAs – including discussion on prior specification and how to enforce properties like stationarity (truncated normals etc)

    – The changepoint detection literature

    – Papers on causal impact (eg http://projecteuclid.org/euclid.aoas/1430226092)

    Also, as a subfield, financial time series is full of different types of Bayesian models – as simulation methods are often the direct way to attack non-linear models such as stochastic volatility models and non-linear state space models. One of the main concerns in this field (historically at least) is how to sample effectively from the latent states which are usually highly dependent (and so naive simulation methods are ineffective). Particle filters (and more recently PMCMC) rose as a way to tackle this problem; but offline solutions also exist (e.gz . H.Rue’s argument is that time series shouldn’t always crave for a filtering-type solution).

    • These are all great examples. The concern I have with some of the early stuff is that it was conceived during a time when computation was hugely limiting (on the sorts of specifications that could be estimated). It’s still really important to understand these models, but now we have Stan! There is also the issue that many applications of the Litterman/Minnesota prior employ the data in specifying the prior, which is a bit naughty.

      I recommend highly this paper by Koop and Korobilis, which talks through many multivariate Bayesian models. A very gentle introduction, if a little dated (inverse Wishart priors etc).
      https://ideas.repec.org/p/rim/rimwps/47_09.html

  3. I worked on designing complex chemical process plants. Though never on predicting project failures.

    Would be curious to see more of your model details if you can post any. I never knew of any good models of this. It used to be all heuristics and intuition. Sounds quite challenging.

  4. Here’s a writeup of a simple time series model for survey data that I implemented in Stan:

    http://kevinsvanhorn.com/2014/05/17/time-series-modeling-for-survey-data/

    It doesn’t give the Stan code, but the translation is straightforward.

    I’ve done a number of other simple time series models in Stan, specifically, dynamic linear models, that I could write up if there’s any interest:

    * Local level with trend.
    * Local level with trend and seasonality.
    * Time-varying tobit model (local level with trend).
    * Univariate linear regression where the intercept varies over time.

  5. This brings to mind something I’ve been thinking about. I have some data on a large number of entities that can have any of a discrete set of states, where an entity’s state can change from one time step to the next.

    The simplest model is to treat the data as a large number of independent Markov chains: infer a transition matrix T, where T[i,j] is the probability that an entity in state i is in state j on the next time step.

    p ~ Dirichlet(alpha);
    count[i,j,t] ~ Multinomial(n[i, t], p);

    where n[i,t] is the number of entities in state i at time t, and count[i,j,t] is the number of these that are in state j on the next time step.

    This turns out not to fit the data well; a better fit is an overdispersed multinomial:

    alpha <- alpha0 * p0;
    p[t] ~ Dirichlet(alpha);
    count[i,j,t] ~ Multinomial(n[i, t], p[t]);

    where p0 is a simplex (probability vector) and alpha is a positive scalar controlling the degree of overdispersion. I've implemented thismodel in Stan.

    Going farther, if you plot the empirical probabilities, you see some clear trends and seasonality, so it might make sense to make p0 a time varying latent variable:

    p0[t+1] ~ Dirichlet(alpha_evolve * p0[t]);

    where parameter alpha_evolve controls the rate of time evolution. This is a lot like a dynamic linear model, except that the measurement model is an overdispersed multinomial and the state evolution is via a Dirichlet distribution instead of a multivariate normal distribution.

  6. I’m not sure if I understood the email properly, but to my mind the research problem is:

    – Maurits has many high-frequency (longitudinal) variables that may indicate the derailment of a process.
    – He observes the derailment of a process fairly infrequently (?) Let’s say it’s the last observation for each individual process.
    – He wants to make a real-time estimate of the probability of the process being derailed.

    If this is the case, then the low-to-high-frequency-interpolation literature is probably a good start. I put together a little example in Stan using a univariate state-space model for this sort of problem. The changes I’d make to model my interpretation of Maurit’s problem would be a) turn the measurement model into a logit, b) implement it longitudinally, with the last observation for each process having the measurement for success or derailment (0 or 1). (I didn’t know it when I wrote the following, but the term “Nowcasting” is actually trademarked by Now-Cast, who do some very good work).

    https://github.com/khakieconomics/nowcasting_in_stan

  7. Gary Koop has some nice information on Bayesian VAR’s and such:

    http://personal.strath.ac.uk/gary.koop/bayes_matlab_code_by_koop_and_korobilis.html

    Edward Greenberg in his book “Introduction to Bayesian Econometrics” derives some time series models as well:

    http://www.amazon.com/Introduction-Bayesian-Econometrics-Edward-Greenberg-ebook/dp/B00A8ICIBS/ref=sr_1_1?ie=UTF8&qid=1460084556&sr=8-1&keywords=edward+greenberg+bayesian

    I second the idea that the Bayesian Econometrics book should come before sleeping and eating :)

Leave a Reply

Your email address will not be published. Required fields are marked *