Skip to content

Baltimore Orioles Hackathon coming soon!


Kevin Tenenbaum writes:

I wanted to let you know about a hackathon that we will be hosting at Camden Yards on February 5th, 2016. This event is a great opportunity for your students to use their statistics, data science and computer science expertise to find novel solutions to problems that Major League Baseball teams deal with every day. Our hope is to excite as many students and researchers as possible about the potential applications of statistics and data science in baseball.

Regular readers will know I was a big Orioles fan back in the late 70s. My dad would take me once a year to Memorial Stadium. The first game we saw, I think Jim Palmer was pitching and he got knocked out in the 1st inning. The next year, he won 1-0 on a one-hitter (or maybe he lost 1-0 on a one-hitter, now I can’t remember). I also saw Rickey Henderson steal a base or two when the A’s came to play. That was just amazing. He got on base and everybody knew he’d try to steal. And he did. I was an O’s fan but everyone in that stadium wanted to see Rickey steal some bases.


  1. ParadeRain says:

    So the second most profitable team in the MLB (according to Forbes) and a $5 billion consulting firm do a cattle call for free high-skilled labor, and you don’t think this is just a little bit suspect?

    • Andrew says:


      As a skilled writer who writes for free, it’s hard for me to criticize people who analyze data for free! Actually, I analyze data for free too. I also do it for pay, and I imagine the people who participate in the hackathon might do some work for pay as well.

    • Alex says:

      Outside of maybe covering their bases (pun!) to make sure they didn’t miss anything obvious, I doubt that the Orioles plan to get any real, actionable results out of an 8 hour event. I would guess that it’s more of a publicity event that also puts them in touch with potential future employees.

  2. Rahul says:

    I always thought of baseball stats as more a curiosity and entertainment thing.

    Do the stats actually give actionable intelligence that help improve team outcomes? How significantly?

    “Novel solutions to problems MLB teams deal with everyday” would never have conjured up the image of a Comp. Sci. Guy nor a Statistician before I read this.

    Can someone outline what sort of “problems” these are? And how comp sci has helped solve them in the past?

  3. Chris says:

    Looks like the two games you mentioned were probably these:

    Good memory! I’ve been surprised to see how often the stuff I thought I remembered from MLB games of my youth is just completely mistaken.

  4. There are a bunch of problems with these kinds of events:

    1. They’re swarmed by amateurs who don’t have command of any basic toolsets to do this kind of analysis.

    2. If you go by yourself, you’ll get added to an existing “team” (looks at least like you can make your own team here, at least).

    3. It’s a competition, meaning it’s been reduced to some kind of predictive measure on some simple task and thus lose all of the nuance of real data analysis.

    4. We have no idea what the data or task is ahead of time, so there’s no domain-area prep we can do,

    5. The data are either proprietary or meaningless, and often both.

    6. Whatever the Orioles are currently doing or thinking will be proprietary, so whatever happens, you won’t be able to tap into their existing knowledge or goals, which is rather important again if you want to do meaningful data analysis.

    Taken together, it’ll be a live Kaggle competition (as Mitzi says, Kaggle turns statistical problems into machine learning problems by rendering them as obfuscated vector classification problems). Only I’d have to go to Baltimore and I won’t be able to publish my solution.

    No thank you.

    As a full disclaimer, I was also a huge Orioles fan as a kid. I have a signed poster or Brooks Robinson in the office. But I was an even bigger Reds fan — my dad drove me from Detroit to Riverfront Stadium in Cincinnatti, where I saw many of the NL greats of the early 70s, including Willie Mays, who waved to me during warmup when I ran as close as I could to the outfield and shouted his name!

    • Rahul says:


      I’m trying to understand your point #3:

      So, something like the Netfix prize framework, would that be an example of lost nuance due to ” predictive measure on some simple task”?

      Or I remember a Kaggle contest where one predicted hospitalization chance in the next year based on patient medical history.

      If these epitomize the simplistic contest types how are the “real” problems you’ve worked on different? I’d love to know more.

      • Exactly. Let’s use Netflix prize as an example of a too-narrow focus on (root mean) square error, which I blogged about the first time around. The entire competition improved their ability to predict the star rating of movies by around 1/10 of a star out of 5. Something that was barely, if at all, noticeable in their interface.

        The key is that Netflix’s real problem wasn’t to reduce squared error in star ratings, but to provide users movie suggestions they would find useful. The two are not the same problem. The problem with Netflix’s (and Amazon’s and everyone else’s) recommendation (as well as earlier iterations of Google) is that the interface presented recommendations in order of how much they thought you’d like them. There was no focus on diversity of results. So if I watched one season of Buffy the Vampire Slayer, they think I’m going to need to be told there are five more seasons as my next recommendations.

        For the Google example, if I typed [Michael Jordan] or [scala] into earlier versions of Google, all I got was the basketball player and the programming language, with no heed to the machine learning researcher or prom-dress makers. Relevannce ranking is only part of the story — good search also requires diversity, which Google finally got around to modeling. (Now I wish they’d go the other way and stop with all the “helpful” crap they throw at the top of every search; it almost makes me think they care more about selling ads than giving good search results. Actually, that was a trick statement; Google’s about selling ads, so that’s what they should optimize, though they also have to think about pissing off users like me, which would hurt them more if there were good alternatives that weren’t advertising driven.

        As an example of real problems in hospitals, predicting the chance of a negative event is a good first step, but it only matters if there is some possible intervention that helps. Finding patients who are ripe candidates for intervention is a different problem than predicting when a patient’s going to die. Reminds me of a story I heard from Peter Szolovits (MIT computer scientist) at one of the i2b2 bakeoffs (run by Harvard to evaluate electronic health record natural language processing). He had a grad student trying to predict when patients would die in the ICU. The grad student came back excited saying he had nearly perfect prediction. It turns out they just turn the life support off, and that was easily noticeable by al the measures. But “learning” that people die when you turn their life support off isn’t so interesting.

        Another example from i2b2 is that the winning team of the obesity challenge went to the length of modeling the data annotation process, because it changed after the first iteration when the task organizers rethought the coding, but they left the earlier data in for both training and evaluation. This isn’t the kind of thing that’s useful; hospitals don’t care about predicting coding standards — they want something they can use to help patients. So what’s useful is fixing the coding throughout and using it to make real predictions! My favorite paper on this point and why machine learning evals are broken is Chris Manning’s take on the Penn Treebank; he told me the only way to get his point out was in an invited talk, because the paper had been rejected as not being on point for modern NLP (which is all about eking out fractions of a percent improvement on test corpora, leading to crazy overfitting on the test portion of the corpus, which is reused eval to eval). The point is that the competitions focus on reductive proxies for the real problem, not the problem at hand.

        Yet another example is all the NLP bakeoffs. They make you classify every item in a corpus and then count the fraction you got right. In the real world, that’s almost never the task of interest (at least in 20 years of working on industrial NLP, it was never a task I ever saw a customer care about). Instead, what people cared about was high precision (high positive predictive accuracy in epidemiology terms) solutions in some cases (web sites that didn’t want to look stupid by making bad recommendations) and high recall in others (postdocs or defense analysts who wanted to make sure they found every instance of the thing they’re looking for and are willing to wade through lots of false positives to get them). Either way, classification loss (or even log loss) isn’t really the criteria of interest for “real” applications.

        • Rahul says:

          Thanks for elaborating on these issues Bob!

          So, do you think public contests like Netflix or Kaggle can improve by using better metrics or performance etc.

          e.g. Had you been at Netflix & wanted to utilize public talent at forecasting / prediction do you have ideas as to how you might have framed the contest?

  5. Mary Schweitzer says:

    Ah, the 1970s was a very good decade to be an Orioles fan. They won more seasons than they lost, and they were almost always in the race on Labor Day weekend (my late Baltimorean husband’s criterion for a good season). The owner left the front office alone – and what a front office it was! Many of the players came up through the Orioles own system and thus had already learned the “Orioles way.” And they had Earl Weaver. An economist once set out to model the relationship between salary and performance in baseball, and came to the conclusion that Earl Weaver was the most underpaid man in baseball, because his influence on the Orioles’ ability to win – and therefore put fans in the seats – was more important than any other factor. Even in the losing years, they were always above .500 at home when Earl the Pearl managed. So you could expect a good time. And a good time we had.

    I don’t remember where that study was published – can’t even give you a date, though I remember that Earl was still managing, so I suspect it was the late 1970s. But your students may like it.

    If you were a graduate student at Hopkins (Homewood campus) in those days, you were so close to the stadium you could hear the crowd. We all used to go in a group on Monday nights when students could get great upper deck reserved seats at half price. We used to tease incoming students with the certainty that an Orioles question was sure to come up on prelims.

    (And Bob – I have a Brooks Robinson autographed BAT in my office – plus four seats from Memorial Stadium in my garage, still waiting to be mounted so they’ll stand up straight.)

    • Chris G says:

      > Ah, the 1970s was a very good decade to be an Orioles fan… Many of the players came up through the Orioles own system and thus had already learned the “Orioles way.” And they had Earl Weaver.

      I was a MA resident so the Red Sox were the prime focus but, by virtue of their being in the AL East, I followed the Orioles a bit then too. They had some excellent teams. Unlike our yahoo Zimmer, Weaver knew how to manage – liked him a lot. They had an excellent farm system and some fearsome rotations. In addition to Palmer, Flanagan and McGregor were no slouches. With regard to the picture above, whatever possessed them to trade DeCinces for Dan Ford? (Oh, wait, the Google is my friend… They traded him to make way for Ripken. Not unreasonable but Ford seems like a pretty low return.)

      • Mary Schweitzer says:

        I imagine Andrew doesn’t want me doing this but I can’t resist it – Dan Ford (who snorted everything except the base line, and we were worried he’d try to snort that, too) was hired by E.B. WILLIAMS, the nightmare owner, who was trying to be like his good friend Steinbrenner.

        Google’s only your friend when you can’t remember how to spell something or can’t remember a citation. Not for this.

        Remember how bad the Indians had been? Williams got rid of the Orioles’ fantastic front office, and most of them landed in Cleveland and … suddenly Cleveland got good. Williams made some big trades, and the Orioles got REALLY bad.

        The only reason people bought seats after that was the promise of a new stadium (and having seniority rights for tickets in the new stadium, which turned out to be phony because they gave them to Washington folks first, figuring they already had Baltimore fans hooked – I know long-term Baltimore fans who swore they would never attend a game in Camden Yards as a result – took us 10 years to get seats comparable to what we had in Memorial Stadium) – and after the new stadium, the AllStar game, and after the AllStar game, Ripken’s streak. For a while it was “see Ripken while he’s still playing.” And the Final Tour. And then … ker-plunk. Ticket values plummeted. It was like the last days of the Colts – we joked that if you put two Orioles tickets on somebody’s windshield when you went in the grocery store, when you came back there’d be two more …

        Do you know who they also traded away? A pitcher named Curt Shilling. The front office mattered. Ignore the front office – get rid of them even though they’re good – and you’re just another yahoo playing fantasy baseball, except it’s with a real team. (The rumor for a while was that Jacobs DID let his sons play fantasy baseball with the Orioles in his early years as owner.)

        So pay attention to who’s making the decisions, pay attention to the Front Office.

        But … there was also a pint-sized Yankees fan who stole an easy out in the playoffs – now how do you model THAT?

  6. Mary Schweitzer says:

    PS – Earl Weaver is best remembered today in the image of him kicking dirt on home plate while arguing with the umpire. Your readers may not know that there was another side to him – Earl was using deep stats before anybody else – he had his index cards for every player – both Orioles and opponents – with the current record on how they did not just against lefties and righties, but against specific pitchers; where they were likely to hit the ball – all kinds of information.

Leave a Reply