A question about race based stratification

Can Candan writes:

I have scraped horse racing data from a web site in Turkey and would like to try some models for predicting the finishing positions of future races, what models would you suggest for that?

There is one recent paper on the subject that seems promising, which claims to change the SMO algorithm of support vector regression to work with race based stratification, but no details given, I don’t understand what to modify with SMO algorithm.

This builds on the above one and improves with NDCG based model tuning of least squares SVR.

There’s a conditional logistic regression approach which I tried to implement, but I couldn’t get the claimed improvement over the public odds of winning, may be I’m doing something wrong here.

I’m quite comfortable with R any books, pointers, code snippets are greatly appreciated.

My reply: Sorry, this one is too far away from my areas of expertise!

9 thoughts on “A question about race based stratification

  1. When I was in high school, I spent hours in libraries going through microfiche (joy) of old newspapers to strip out racing forms to do something similar for a science fair project (gambling is a fun class activity). I can’t help answer the question directly, because my model-fu is middling at best, but you’ll want something that’s sensitive to multiple time series, which I know some models aren’t. Traditional logic is that a horse’s behavior changes rapidly with age, time between races, rest, and even position during a race (some horses like to catch other horses but hate being in front, some start too strong, etc), in addition to the usual factors (track condition, length, jockey, starting odds, and what have you).

    I’m in Kentucky, right next to Churchill Downs — drop me a line if you’d be up for discussing it; it’s an interesting topic (my contact info is on my website).

  2. I am not sure why one should expect “improvement over the public odds of winning”. I would guess a formal statistical algorithm might beat subjective predictions where the relevant data are of limited dimension, but all those punters, taken together, may be a lot better at identifying and synthesizing the huge variety of events that might have a bearing on the results of a particular horse race. In other words, the formal analysis is better at getting the answer right, but the punters are better at getting the question right.

    • Andrew and I once saw an entertaining talk at the UC Berkeley stats department. The speaker had a degree in statistics, and had moved to Vegas to become a bookie or to give advice to bookies or something like that. I mostly remember his story about a Super Bowl — might have been Dallas vs Pittsburgh — in which most of the early money was coming in on one of the teams, so the bookies changed the spread for later bettors, and then most of the late money came in the other way…and then the actual spread was exactly in between the early and late point spreads, so the bookies lost money to both groups of bettors.

      Anyway, the guy asserted at the time that the Vegas spread was the gold standard, and that neither he nor anyone else had been able to find a statistical model based on available data (historical performances in games, salaries and ages of players, etc.) that performed as well when it came to predicting the outcome of individual games. However, that was 20 years ago now. A lot more information is readily available, and modeling tools have advanced also. It’s worth a try.

  3. A couple of thoughts: first, Shane Reese at Brigham Young has published papers based on hierarchical and permutation modeling of NASCAR driver rankings…here’s a link: https://madison.byu.edu/racing/racing.html It’s not horse racing but could provide a window into the issue. In addition, and this is more speculative, it could be that a prediction market like the late Intrade, or any of the sports-focused prediction markets would be an avenue for prospective predictions as to horse race outcomes. PM sports sites include BETDAQ, Smarkets, VegasInsider, OddsChecker, ScoresAndOdds, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *