Can Candan writes:
I have scraped horse racing data from a web site in Turkey and would like to try some models for predicting the finishing positions of future races, what models would you suggest for that?
There is one recent paper on the subject that seems promising, which claims to change the SMO algorithm of support vector regression to work with race based stratification, but no details given, I don’t understand what to modify with SMO algorithm.
This builds on the above one and improves with NDCG based model tuning of least squares SVR.
There’s a conditional logistic regression approach which I tried to implement, but I couldn’t get the claimed improvement over the public odds of winning, may be I’m doing something wrong here.
I’m quite comfortable with R any books, pointers, code snippets are greatly appreciated.
My reply: Sorry, this one is too far away from my areas of expertise!
When I was in high school, I spent hours in libraries going through microfiche (joy) of old newspapers to strip out racing forms to do something similar for a science fair project (gambling is a fun class activity). I can’t help answer the question directly, because my model-fu is middling at best, but you’ll want something that’s sensitive to multiple time series, which I know some models aren’t. Traditional logic is that a horse’s behavior changes rapidly with age, time between races, rest, and even position during a race (some horses like to catch other horses but hate being in front, some start too strong, etc), in addition to the usual factors (track condition, length, jockey, starting odds, and what have you).
I’m in Kentucky, right next to Churchill Downs — drop me a line if you’d be up for discussing it; it’s an interesting topic (my contact info is on my website).
This is really interesting! A mix of time series and categorical variables. Can you summarize the approach you took or at least some of the things you looked at which yielded some juice?
I am not sure why one should expect “improvement over the public odds of winning”. I would guess a formal statistical algorithm might beat subjective predictions where the relevant data are of limited dimension, but all those punters, taken together, may be a lot better at identifying and synthesizing the huge variety of events that might have a bearing on the results of a particular horse race. In other words, the formal analysis is better at getting the answer right, but the punters are better at getting the question right.
Even to assume that the formal analysis gets the answer right seems too optimistic.
Andrew and I once saw an entertaining talk at the UC Berkeley stats department. The speaker had a degree in statistics, and had moved to Vegas to become a bookie or to give advice to bookies or something like that. I mostly remember his story about a Super Bowl — might have been Dallas vs Pittsburgh — in which most of the early money was coming in on one of the teams, so the bookies changed the spread for later bettors, and then most of the late money came in the other way…and then the actual spread was exactly in between the early and late point spreads, so the bookies lost money to both groups of bettors.
Anyway, the guy asserted at the time that the Vegas spread was the gold standard, and that neither he nor anyone else had been able to find a statistical model based on available data (historical performances in games, salaries and ages of players, etc.) that performed as well when it came to predicting the outcome of individual games. However, that was 20 years ago now. A lot more information is readily available, and modeling tools have advanced also. It’s worth a try.
That’s a great story. (perhaps more viscerally satisfying because The House loses)
You might be interested to know that Vegas just recently legalized “sports betting investment funds”: http://espn.go.com/chalk/story/_/id/13006097/nevada-legalizes-sports-betting-investment-funds-espn-chalk
I’m sure this will turn out to be a great idea.
Are bookies “The House”? I thought of bookies as plucky underdogs, “rogue businessmen” if you will.
Right you are — just like how Dubner and Levitt (“rogue economist”) were the plucky underdogs of trying to publish a pop sci best seller. But hey, once Malcolm Gladwell came out with “The Tipping Point”, all bets were off.
A couple of thoughts: first, Shane Reese at Brigham Young has published papers based on hierarchical and permutation modeling of NASCAR driver rankings…here’s a link: https://madison.byu.edu/racing/racing.html It’s not horse racing but could provide a window into the issue. In addition, and this is more speculative, it could be that a prediction market like the late Intrade, or any of the sports-focused prediction markets would be an avenue for prospective predictions as to horse race outcomes. PM sports sites include BETDAQ, Smarkets, VegasInsider, OddsChecker, ScoresAndOdds, etc.