Sean Taylor, a research scientist at Facebook and Stan user, writes:
I wanted to tell you about an open source forecasting package we just released called Prophet: I thought the readers of your blog might be interested in both the package and the fact that we built it on top of Stan.
Under the hood, Prophet uses Stan for optimization (and sampling if the user desires) in order to fit a non-linear additive model and generate uncertainty intervals. The big win for us was that 1) Stan does a great job at letting us separate optimization from the model code and 2) we could share the same core procedure between Python and R implementations. One of the neat things we do is automatically detect changepoints in the time series by specifying a sequence potential parameter changes and shrinking the shifts using a Laplace prior. We also let the user adjust the flexibility of the model by tuning precision of priors, which we think is intuitive for most users.
Prophet is used internally in many applications at Facebook. There are more details in our blog post.
From the linked webpage:
At its core, the Prophet procedure is an additive regression model with four main components:
- A piecewise linear or logistic growth curve trend. Prophet automatically detects changes in trends by selecting changepoints from the data.
- A yearly seasonal component modeled using Fourier series.
- A weekly seasonal component using dummy variables.
- A user-provided list of important holidays.
As an example, here is a characteristic forecast: log-scale page views of Peyton Manning’s Wikipedia page that we downloaded using the wikipediatrend package. Since Peyton Manning is an American football player, you can see that yearly seasonality plays and important role, while weekly periodicity is also clearly present. Finally you see certain events (like playoff games he appears in) may also be modeled.
Prophet will provide a components plot which graphically describes the model it has fit:
This plot more clearly shows the yearly seasonality associated with browsing to Peyton Manning’s page (football season and the playoffs), as well as the weekly seasonality: more visits on the day of and after games (Sundays and Mondays). You can also notice the downward adjustment to the trend component since he has retired recently.
Hey, that reminds me of the birthday problem!
The Prophet webpage continues:
The important idea in Prophet is that by doing a better job of fitting the trend component very flexibly, we more accurately model seasonality and the result is a more accurate forecast. We prefer to use a very flexible regression model (somewhat like curve-fitting) instead of a traditional time series model for this task because it gives us more modeling flexibility, makes it easier to fit the model, and handles missing data or outliers more gracefully.
By default, Prophet will provide uncertainty intervals for the trend component by simulating future trend changes to your time series. If you wish to model uncertainty about future seasonality or holiday effects, you can run a few hundred HMC iterations (which takes a few minutes) and your forecasts will include seasonal uncertainty estimates.
And the punchline:
We fit the Prophet model using Stan, and have implemented the core of the Prophet procedure in Stan’s probabilistic programming language. Stan performs the MAP optimization for parameters extremely quickly (<1 second), gives us the option to estimate parameter uncertainty using the Hamiltonian Monte Carlo algorithm, and allows us to re-use the fitting procedure across multiple interface languages. Currently we provide implementations of Prophet in both Python and R. They have exactly the same features and by providing both implementations we hope to make our forecasting approach more broadly useful in the data science communities.
I like how it can run from both Python and R, which works well with Stan’s multiple interfaces.
Prophet also fits into our big picture which is that Stan can be inserted within applications that use statistics. Statistical modeling and data analysis is not typically a goal in itself; it is a means to an end—or to many ends.
Finally, when Sean sent me this link, I promised that in my post I’d include a challenging time series that I think will stump Prophet but could perhaps be a good test example for going further with your time series modeling.
The example is the famous annual Canadian lynx series, which is available in R and is notoriously ill-fit by conventional ARMA-type time series models. The challenge is to fit the model to the first 80 years of data and then predict the following 34 years, and the issue is that the lynx series goes up and down due to its internal dynamics.
Several years ago Cavan Reilly and Angelique Zeringue fit a simple Bayesian predator-prey model to these 80 data points and it worked really well, outpredicting classical time-series models that had many more parameters.
I doubt Prophet would do well on this dataset because the curve oscillates but without a fixed period. But I thought this could be a good test example to motivate improvements in the model.
Full disclosure: Facebook has given me financial support for my research.