There’s a fantastic sci-fi painting of a gigantic robot walking through a shallow sea towing a couple of shipd with its two lower arms, and trying to catch a spaceship flying between its two upper arms. A perfect illustration of a k-armed bandit encountering one of Holland’s GA hyperspace planes! Can’t remember the artist :-( .

]]>What is now jargon may one day grow into mighty technical terminology. When an outsider picks up a technical book in some field, it all looks like jargon.

]]>My forthcoming post has no new content; it’s just written in my style rather than Bob’s.

]]>I like “arbitrary symbol slash arbitrary symbol,” as it conveys what A/B testing is: the comparison of two arbitrary alternatives. But, sure, I really like a descriptive term such as “exploration/exploitation tradeoff.” Somewhere in another dimension are terms such as “bias” which are usefully descriptive but also confusing in that their precise meaning does not always line up with what one might expect.

]]>For what it’s worth, I sympathize with the feeling of being bothered by non descriptive terms. I myself think “A/B testing” is one of the most detestable and content-free terms I’ve ever come across. It’s literally “arbitrary symbol slash arbitrary symbol”. We might as well call it Jabberwocky/Supercalifragilisticexpialidocious testing (and dear Lord, may nobody follow up on this suggestion!)

]]>I’m not saying the term “bandit” is useless. I just think it points in the wrong direction. Sometimes jargon is delightfully appropriate, other times jargon adds no insight and I think can be a barrier to outsiders. I would prefer a meaningless term such as “Jax” to “bandit,” as at least it wouldn’t have certain misleading associations. Even better, in my opinion, would be a more descriptive term.

]]>Is there any chance you could give us a preview of what you wanted to add to this discussion from your perspective? I don’t know if I can wait until August!

]]>I’m not quite sure what name I’d prefer for this problem. I wouldn’t mind calling it “Bayesian A/B testing” or “Sequential design for the A/B testing problem.” That’s not quite right because it could be A/B/C/… testing. So we’ll have to think more about this one.

I agree that “multi-armed bandit” is standard jargon within some fields. I still find it unnecessarily obscure and non-descriptive. A term such as “reward” doesn’t bother me because the common English meaning seems close to the technical meaning.

]]>Do you think I should not put “bandit” in the title and instead use something like “sequential design” or “sequential decision process”? I could instead use the MDP jargon, but we don’t have the Markovian structure you get in a lot of control applications.

Are you OK with “reward” and the rest of the reinforcement learning terminology, or is there some term used in statistics you feel is less jargony?

]]>If the terminology was as widespread as “board,” I’d say, sure, no problem. But I don’t think that’s the case. “Bandit” is jargon: it’s understood within a narrow community, but outside of that community I think that sort of jargon can exclude rather than include people. I don’t mind a jargon term that has some internal consistency—for example, “regression to the mean”—but I don’t like a jargon term where, to understand where it’s coming from, you first have to hear the story of slot machines and then realize all the ways the analogy doesn’t work.

And, sure, I have similar problems with the term “Bayesian statistics.” In that case I felt (as of 1995) that it was better to go with the accepted term. But even there I kinda regret not switching to a more descriptive title such as “Data analysis using probability models.”

]]>How about just taking this as meaning drift, the way “board” came to mean something a game was played on and a group of executives, not just a plank of wood?

]]>Let me just emphasize that I hate the “bandit” terminology, as the whole point of the actual “bandit” is that it takes your money, whereas in this exploration-exploitation design-and-optimization problem, the goal is to choose an option that has positive expected value. I really do think it misses the point. Jokes are ok but not when the joke goes in the opposite direction of the intended meaning!

]]>The Bernoulli is very simple in that there’s only two possible outcomes, a reward of 1 with probability theta and a reward of 0 with probability (1 – theta). The pun is in the Bernoulli parameter theta being the expected return of Bernoulli(theta). You get the same pun with the Poisson or the location parameter of the normal.

The Thompson sampling approach I coded up in Stan requires a way to compare bandit parameters and determine which is best. For the binomial case, that’s just the max function. In general, it could be a nested Monte Carlo calculation if simulating from the bandits given a value for the parameters is possible. Determining which bandit is best for a given set of parameters lets us code up the indicator variable is_best, which is all we need to compute the posterior event probability of each bandit being the best—the Thompson sampling part of it works exactly the same way.

Being able to draw from the bandits given the parameters would be enough. Then you could build a nested Monte Carlo calculation of the best bandit right into the generated quantities. In the Bernoulli case, that’d involve simulating a bunch of Bernoulli draws from theta and comparing the totals. That’s not necessary, as theta is the expectation for the Bernoulli—that’s the appealing simplicity that can cause confusion because it’s conflating the parameter and the expectation.

]]>My first thought was “at least you didn’t name it R”…but then I recalled that the first Google search result for R is the CRAN website, so maybe that one was not so terrible.

I was then curious how the software package Stan would show up on a Google search for “Stan”. It does show up (not in the main links, but reference is made in the top right corner)…but I also learned that according to the Cambridge Dictionary (but not Websters, not sure about OED), stan means “an overzealous or obsessive fan of a particular celebrity” (reference to the Eminem song by the same name).

You learn something everyday!

]]>Indeed. If we can just get you and Christian Robert to write on the topic, we’ll have A/B/C/D testing.

]]>Hey! I wrote a post based on this email exchange too. My post is scheduled to appear in Aug. Readers will be able to compare our two writing styles.

]]>Off topic: I’ve been doing a little work on player tracking in basketball games lately and see you’ve done a lot in this area.

Do you happen have any favourite references? In particular we’ve been struggling to robustly deal with player-player intersections etc. Thanks!

]]>