Sean Taylor, a research scientist at Facebook and Stan user, writes:

I wanted to tell you about an open source forecasting package we just released called Prophet: I thought the readers of your blog might be interested in both the package and the fact that we built it on top of Stan.

Under the hood, Prophet uses Stan for optimization (and sampling if the user desires) in order to fit a non-linear additive model and generate uncertainty intervals. The big win for us was that 1) Stan does a great job at letting us separate optimization from the model code and 2) we could share the same core procedure between Python and R implementations. One of the neat things we do is automatically detect changepoints in the time series by specifying a sequence potential parameter changes and shrinking the shifts using a Laplace prior. We also let the user adjust the flexibility of the model by tuning precision of priors, which we think is intuitive for most users.

Prophet is used internally in many applications at Facebook. There are more details in our blog post.

From the linked webpage:

At its core, the Prophet procedure is an additive regression model with four main components:

- A piecewise linear or logistic growth curve trend. Prophet automatically detects changes in trends by selecting changepoints from the data.
- A yearly seasonal component modeled using Fourier series.
- A weekly seasonal component using dummy variables.
- A user-provided list of important holidays.
As an example, here is a characteristic forecast: log-scale page views of Peyton Manning’s Wikipedia page that we downloaded using the wikipediatrend package. Since Peyton Manning is an American football player, you can see that yearly seasonality plays and important role, while weekly periodicity is also clearly present. Finally you see certain events (like playoff games he appears in) may also be modeled.

Prophet will provide a components plot which graphically describes the model it has fit:

This plot more clearly shows the yearly seasonality associated with browsing to Peyton Manning’s page (football season and the playoffs), as well as the weekly seasonality: more visits on the day of and after games (Sundays and Mondays). You can also notice the downward adjustment to the trend component since he has retired recently.

Hey, that reminds me of the birthday problem!

The Prophet webpage continues:

The important idea in Prophet is that by doing a better job of fitting the trend component very flexibly, we more accurately model seasonality and the result is a more accurate forecast. We prefer to use a very flexible regression model (somewhat like curve-fitting) instead of a traditional time series model for this task because it gives us more modeling flexibility, makes it easier to fit the model, and handles missing data or outliers more gracefully.

By default, Prophet will provide uncertainty intervals for the trend component by simulating future trend changes to your time series. If you wish to model uncertainty about future seasonality or holiday effects, you can run a few hundred HMC iterations (which takes a few minutes) and your forecasts will include seasonal uncertainty estimates.

And the punchline:

We fit the Prophet model using Stan, and have implemented the core of the Prophet procedure in Stan’s probabilistic programming language. Stan performs the MAP optimization for parameters extremely quickly (<1 second), gives us the option to estimate parameter uncertainty using the Hamiltonian Monte Carlo algorithm, and allows us to re-use the fitting procedure across multiple interface languages. Currently we provide implementations of Prophet in both Python and R. They have exactly the same features and by providing both implementations we hope to make our forecasting approach more broadly useful in the data science communities.

I like how it can run from both Python and R, which works well with Stan’s multiple interfaces.

Prophet also fits into our big picture which is that Stan can be inserted within applications that use statistics. Statistical modeling and data analysis is not typically a goal in itself; it is a means to an end—or to many ends.

Finally, when Sean sent me this link, I promised that in my post I’d include a challenging time series that I think will stump Prophet but could perhaps be a good test example for going further with your time series modeling.

The example is the famous annual Canadian lynx series, which is available in R and is notoriously ill-fit by conventional ARMA-type time series models. The challenge is to fit the model to the first 80 years of data and then predict the following 34 years, and the issue is that the lynx series goes up and down due to its internal dynamics.

Several years ago Cavan Reilly and Angelique Zeringue fit a simple Bayesian predator-prey model to these 80 data points and it worked really well, outpredicting classical time-series models that had many more parameters.

I doubt Prophet would do well on this dataset because the curve oscillates but without a fixed period. But I thought this could be a good test example to motivate improvements in the model.

*Full disclosure: Facebook has given me financial support for my research.*

Seems like Andrew’s conjecture is correct. RMSE for “vanilla” Prophet on the first 80 years on the lynx dataset is 2194, whereas the simple Bayesian predator-prey model was 1480.

R code below:

m <- prophet(data.frame(ds=1:80,y=lynx[1:80]))

future <- make_future_dataframe(m,periods=35)

forecast <- predict(m,future)

sqrt(mean((lynx[-(1:80)]-forecast[-(1:80),"yhat"])^2))

Still a pretty impressive package!

Felix:

Yes, the point of that example was not meant to “debunk” Prophet but rather to suggest directions for improvement. The great thing about a method that runs and gives answers is that we can try it on lots of problems and use these results to learn how to do better.

Yes, absolutely. That’s how I read your post, and certainly my analysis above was not meant to “debunk” anything. I was genuinely impressed by the package that allowed me with 2 (!) lines of code to fit a Stan time-series model, that converges within milliseconds.

Wow. This has me hooked “fit a Stan time-series model, that converges within milliseconds.”

The speed comes from the fact that it is MAP?

It apparently has options for MAP (optimization) and full Bayes (sampling). Helps to have the models precompiled, too.

Yep, comes from MAP, which is the default option. If sampling is requested, speed is equivalent to what one would expect from a model like this fitted in Stan – for the Lynx data with a plain Prophet model that I used in the example about 1 minute per chain.

NNS offers an alternative to traditional ARMA models in R. I never tested it on the lynx data until reading this post. using the “vanilla” linear method:

> require(NNS);require(datasets)

> sqrt(mean((NNS.ARMA(lynx,h=34,training.set = 80,method=’lin’)-tail(lynx,34))^2))

[1] 1397.105

I think incorporating all of these insights will yield even more robust methods, echoing Andrew’s thoughts.

More examples here: https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS%20ARMA.pdf

I’m not as strong on Fourier series as I’d like…

Section 2.2 in their paper discusses how they create the seasonal weights. Formula 5 looks a little different from what I had seen in the past on seasonality. It looks like it would give imaginary numbers unless mod(t, 365.25)=0. Would I need to take only the real component and ignore the imaginary?

Here’s the math, it’s more complex than just taking a real part, but not much.

Define the complex fourier coefficient as the dot-product of exp(%i*omega*x) with a function f(x) over the domain 0 to 2pi (where omega is an integer). Suppose that f really is the function A*cos(omega*x) + B*sin(omega*x), then you can get a relationship between the imaginary coefficient, and the real A,B

Here’s a link to the calculation using Maxima:

http://maxima-online.org/#?in=f%3A%20A*cos%28omega*x%29%20%2B%20B*sin%28omega*x%29%3Beqn%3A%20a%2Bb*%25i%20%3D%20integrate%28exp%28%25i*omega*x%29*f%2Cx%2C0%2C2*%25pi%29%3BAsoln%3Asolve%28%5Brealpart%28eqn%29%2Cimagpart%28eqn%29%5D%2C%5BA%2CB%5D%29%3B

I guess for the usual definitions, you should dot product against exp(-%i*omega*x) (note the minus sign). There’s also stuff about normalization. The conventions for normalization and sign and soforth vary from field to field (numerical calculation of fourier series often ignores certain constants due to speed issues and you might be required to normalize things by the constant at the end) but the concept is still basically as above… dot product of a function with the complex exponential.

Hey John, it’s likely an error. I got Euler’s formula wrong. You can see how it’s implemented (correctly) in code using sin/cos here: https://github.com/facebookincubator/prophet/blob/master/python/fbprophet/forecaster.py#L189

The sin/cos approach to seasonality is what I was more familiar with.

I’ve been playing with this package for the last few days. It seems to work extremely well for quite a few of the series I’ve thrown at it (which have strong periodic seasonality). I like how you can include a capacity for the logistic trend. Fantastic work.

The caution I’d make is that if the true time series has a random walk component, then a model like the one in prophet (which is basically y_t = f(t) + e_t) will be way too confident at long prediction intervals.

Thanks for trying Jim! Yes you’re absolutely right that we don’t do well with random walk components and are very likely to overfit them.

How does this compare to using lmer or even lm to just fit an explicit fourier + logistic curve?

When this periodic seasonality stuff came up last time it was about dummy variables, and the DataColada solution was to encourage people to use daily dummy variables… which I really didn’t like. and so I just whipped up this thing using straight up “lm”

http://models.street-artists.org/2016/05/02/philly-vs-bangkok-thinking-about-seasonality-in-terms-of-fourier-series/

I could imagine wanting something a little better, using lmer, but the advantages here are that the whole thing runs in milliseconds. That’s of course supposed to be an advantage to Prophet too… But typically the advantage to using Stan is typically you could do something like put in priors and specify a bunch of basis functions by hand, but that’s not what Prophet is all about right?

I think for any given problem you can probably doing a better job writing down your own model. People have correctly noted that Prophet is not so different from a GAM with a piecewise constant or piecewise logistic component. We do have some tricks for detecting changepoints that work well and help with extrapolation. But mostly we’re targeting use cases where people don’t have time or skills necessary to expend modeling effort.

>>>cases where people don’t have time or skills necessary to expend modeling effort<<<

And that's a huge audience right there.

Thanks for making it open source. I was wondering if you have thought about having general regressors in the model. Holidays are one type of regressor which is already supported. But in many practical cases you might have other time series that might explain some movements of the time series to be forecasted e.g. sales and price. Seems like it should fit into the framework effortlessly. Would be great to know your thoughts on that.

I gave prophet a quick spin yesterday, and there’s lots to like: 4-5 lines of code, a few seconds, and you get a good result. I can foresee using this.

I like the flexibility in model specification, including adding holidays and carrying capacity. Sometimes, though, there are continuous predictor variables. To take Phil’s example from not long ago, say you’re modeling building energy consumption, and you have good reason to believe it’s affected by outdoor air temperature. I don’t see that in prophet; are there any plans to consider something like that?

Would tgp (the treed Gaussian processes package) be a reasonable package to compare to this? I used it a bit a couple of years ago. In at least one case, I got good and relatively fast results; in other cases, it seemed to take a long time, and I didn’t figure out if that was my model or inherent to the computation.

How about TS with time dependent variance (GARCH etc.)? These are many of the financial TS. Probably not so good?