More forecasting competitions

Anthony Goldbloom from Kaggle writes:

We’ve recently put up some interesting new competitions. Last week, Jeff Sonas, the creator of the Chessmetrics rating system, launched a competition to find a chess rating algorithm that performs better than the official Elo system. Already nine teams have created systems that make more accurate predictions than Elo. It’s not a surprise that Elo has been outdone – the system was invented half a century ago before we could easily crunch large amounts of historical data. However, it is a big surprise that Elo has been outperformed so quickly given that it is the product of many years’ work (at least it was a surprise to me).

Rob Hyndman from Monash University has put up the first part of a tourism forecasting competition. This part requires participants to forecast the results of 518 different time series. Rob is the editor of the International Journal of Forecasting and has promised to invite the winner to contribute a discussion paper to the journal describing their methodology and giving their results (provided the winner achieves a certain level of predictive accuracy).

Finally the HIV competition recently ended. Chris Raimondi describes his winning method on the Kaggle blog.

Cool stuff. On the chess example, I’m not at all surprised that Elo has been outperformed. Published alternatives have been out there for years. We even have a chess example in Bayesian Data Analysis (in the first edition, from 1995), and that in turn is based on earlier work by Glickman.

I like this competition idea and would like to propose some ideas of our own, through the Applied Statistics Center. I’m thinking that if done right, this could be the basis of a Quantitative Methods in Social Sciences M.A. thesis. In any case, it would be a great way for students to get involved.

10 thoughts on “More forecasting competitions

  1. This is a neat little competition with easy-to-parse data and simple submission formats. 8631 top players in 65,053 games with an indication of side and outcome. The only predictor associated with a game is a month identifier, ranging from 1 to 100 (presumably in order). They allow fractional predictions in the submissions.

    The prediction they're after is each player's expected wins in 5 later months, which they calculate in the obvious way from the predictions.

    The evaluation metric's root mean square error. Bring on the committees. If you don't understand why, see my post on why committees won the Netflix prize.

    It'd be fun to play with hierarchical item-response models like Andrew and Jennifer describe in their regression book.

  2. Bob: Perhaps one could start with Generalized Gauss Markov Theorem which would be more explicit about the assumptions (unbiased predictions) and exactly how to tune the weights.

    Believe Jamie Robins generalized this to include known magnitude of bias but not direction in individual predictors using ideas from John Tukey and David Cox.

    Just a thought – tuning the weights can be important.

    K?

  3. Hyndman's been associated with earlier forecasting competitions with Makridakis. (M1, M2, M3)

    The major conclusion — that relatively simple models perform best, if you have relatively simple data (univariate time series) — is a great one for practitioners and I have appreciated the published empirical support.

    A secondary aspect is to provide standardized comparison sets of data for evaluating new forecasting schemes. We know how many methods perform on hundreds of data sets, so new methods can be compared on a simple and honest basis.

  4. @zbicyclist: Team Shashta is already beating the best method in the Hyndman paper (team Theta Benchmark):
    http://kaggle.com/tourism1?viewtype=leaderboard
    The Hyndman paper applied all the standard univariate methods – so it will be interesting to learn more about Team Shashta's approach.

    Your second point is well made. One great aspect of competitions (from a research perspective) is that all models are trained and judged based on the same dataset and the same evaluation method – so they are a great way to benchmark different techniques.

    What's more, since competitors don't have access to the answers on the "test" dataset, there is no way for them to tamper with the results by over-fitting their models.

  5. @Goldbloom: By the time I read your post, leecbaker had taken over the leader board! Some interesting things.

    1. Both leecbaker and Team Shasta had submitted multiple entries — between the two of them, they'd submitted over 50% of the entries. So there's some possibility of chance results.

    2. But wait! the standings are based on only 20% of the full validation set, so this is controlled for.

    This is a great idea; I'm overloaded with work at the moment, but may enter just to make the person currently at the bottom feel good, similar to my reason for entering bike races.

  6. What an interesting way to harnessing the talent of freelancers to solve business problems!! The most popular competition was the Netflix prize of course, but increasingly businesses have been using competitions as a novel way of getting new and creative ideas to improve predictive analytics.

    Thanks for the link to the Kaggle website. I can see that as a popular destination at least from my browser.

  7. @zbicyclist

    Things move quickly. When I emailed Andrew around a week ago, nine teams were beating the Elo benchmark (for the chess competition). Today 50 teams are beating the benchmark (and probably more by the time you read this).

    Good luck if you do make an entry.

Comments are closed.