Skip to content

Poker math showdown!

Screen Shot 2014-08-25 at 7.16.22 AM

In comments, Rick Schoenberg wrote:

One thing I tried to say as politely as I could in [the book, "Probability with Texas Holdem Applications"] on p146 is that there’s a huge error in Chen and Ankenman’s “The Mathematics of Poker” which renders all the calculations and formulas in the whole last chapter wrong or meaningless or both. I’ve never received a single ounce of feedback about this though, probably because only like 2 people have ever read my whole book.

Jerrod Ankenman replied:

I haven’t read your book, but I’d be happy to know what you think is a “huge” error that invalidates “the whole last chapter” that no one has uncovered so far. (Also, the last chapter of our book contains no calculations—perhaps you meant the chapter preceding the error?). If you contacted one of us about it in the past, it’s possible that we overlooked your communication, although I do try to respond to criticism or possible errors when I can. I’m easy to reach; will work for a couple more months.

Hmmm, what’s on page 146 of Rick’s book? It comes up if you search inside the book on Amazon:

Screen Shot 2014-08-25 at 7.24.26 AM

So that’s the disputed point right there. Just go to the example on page 290 where the results are normally distributed with mean and variance 1, check that R(1)=-14%, then run the simulation and check that the probability of the bankroll starting at 1 and reaching 0 or less is approximately 4%.

I went on to Amazon but couldn’t access page 290 of Chen and Ankenman’s book to check this. I did, however, program the simulation in R as I thought Rick was suggesting:

waiting <- function(mu,sigma,nsims,T){
  time_to_ruin <- rep(NA,nsims)
  for (i in 1:nsims){
    virtual_bankroll <- 1 + cumsum(rnorm(T,mu,sigma))
    if (any(virtual_bankroll<0)) {
      time_to_ruin[i] <- min((1:T)[virtual_bankroll<0])

a <- waiting(mu=1,sigma=1,nsims=10000,T=100)

Which gave the following result:

> print(mean(!
[1] 0.0409
> print(table(a))
  1   2   3   4   5   6   8   9 
218 107  53  13   9   7   1   1 

These results indicate that (i) the probability is indeed about 4%, and (ii) T=100 is easily enough to get the asymptotic value here.

Actually, the first time I did this I kept getting a probability of ruin of 2% which didn't seem right--I couldn't believe Rick would've got this simple simulation wrong--but then I found the bug in my code: I'd written "cumsum(1+rnorm(T,mu,sigma))" instead of "1+cumsum(rnorm(T,mu,sigma))".

So maybe Chen and Ankenman really did make a mistake. Or maybe Rick is misinterpreting what they wrote. There's also the question of whether Chen and Ankenman's mathematical error (assuming they did make the mistake identified by Rick) actually renders all the calculations and formulas in their whole last chapter, or their second-to-last chapter, wrong or meaningless or both.

P.S. According to the caption at the Youtube site, they're playing rummy, not poker, in the above clip. But you get the idea.

P.P.S. I fixed a typo pointed out by Juho Kokkala in an earlier version of my code.

How Many Mic’s Do We Rip

Yakir Reshef writes:

Our technical comment on Kinney and Atwal’s paper on MIC and equitability has come out in PNAS along with their response. Similarly to Ben Murrell, who also wrote you a note when he published a technical comment on the same work, we feel that they “somewhat missed the point.” Specifically: one statistic can be more or less equitable than another, and our claim has been that MIC is more equitable than other existing methods in a wide variety of settings. Contrary to what Kinney and Atwal write in their response (“Falsifiability or bust”), this claim is indeed falsifiable — it’s just that they have not falsified it.

2. We’ve just posted a new theoretical paper that defines both equitability and MIC in the language of estimation theory and analyzes them in that paradigm. In brief, the paper contains a proof of a formal relationship between power against independence and equitability that shows that the latter can be seen as a generalization of the former; a closed-form expression for the population value of MIC and an analysis of its properties that lends insight into aspects of the definition of MIC that distinguish it from mutual information; and new estimators for this population MIC that perform better than the original statistic we introduced.

3. In addition to our paper, we’ve also written a short FAQ for those who are interested in a brief summary of where the conversation and the literature on MIC and equitability are at this point, and what is currently known about the properties of these two objects.

PS – at your suggestion, the theory paper now has some pictures!

We’ve posted on this several times before:

16 December 2011: Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

26 Mar 2012: Further thoughts on nonparametric correlation measures

4 Feb 2013: Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?

14 Mar 2014: The maximal information coefficient

1 May 2014: Heller, Heller, and Gorfine on univariate and multivariate information measures

7 May 2014: Once more on nonparametric measures of mutual information

I still haven’t formed a firm opinion on these things. Summarizing pairwise dependence in large datasets is a big elephant, and I guess it makes sense that different researchers who work in different application areas will have different perspectives on the problem.

Recently in the sister blog

Replication Wiki for economics

Jan Hoeffler of the University of Gottingen writes:

I have been working on a replication project funded by the Institute for New Economic Thinking during the last two years and read several of your blog posts that touched the topic.

We developed a wiki website that serves as a database of empirical studies, the availability of replication material for them and of replication studies.

It can help for research as well as for teaching replication to students. We taught seminars at several faculties for which the information of this database was used. In the starting phase the focus was on some leading journals in economics, and we now cover more than 1800 empirical studies and 142 replications. Replication results can be published as replication working papers of the University of Göttingen’s Center for Statistics.

Teaching and providing access to information will raise awareness for the need for replications, provide a basis for research about the reasons why replications so often fail and how this can be changed, and educate future generations of economists about how to make research replicable.

The field is a fractal

In a blog comment, Winston Lin points to this quote from Bill Thurston:

There is a real joy in doing mathematics, in learning ways of thinking that explain and organize and simplify. One can feel this joy discovering new mathematics, rediscovering old mathematics, learning a way of thinking from a person or text, or finding a new way to explain or to view an old mathematical structure.

This inner motivation might lead us to think that we do mathematics solely for its own sake. That’s not true: the social setting is extremely important. We are inspired by other people, we seek appreciation by other people, and we like to help other people solve their mathematical problems. What we enjoy is changes in response to other people. Social interaction occurs through face-to-face meetings. It also occurs through written and electronic correspondence, preprints, and journal articles. One effect of this highly social system of mathematics is the tendency of mathematicians to follow fads. For the purpose of producing new mathematical theorems this is probably not very efficient: we’d seem to be better off having mathematicians cover the intellectual field much more evenly. But most mathematicians don’t like to be lonely, and they have trouble staying excited about a subject, even if they are personally making progress, unless they have colleagues who share their excitement.

Fun quote but I disagree with the implications of the last bit. The trouble with the quote is the implication that there is a natural measure on “the intellectual field” so that it can be covered “evenly.” But I think the field is more of a fractal with different depths at different places, depending on how closely you look.

If we wanted to model this formally, we might say that researchers decide, based on what other people are doing, which parts of their fields are worth deeper study. It’s not just about being social, the point is that there’s no uniform distribution. To put it another way, following “fads,” in some sense, is a necessity, not a choice. This is not to say that whatever is currently being done is best; perhaps there should be more (or less) time-averaging, of the sort that we currently attain, for example, by appointing people to long-term job contracts (hence all the dead research that my colleagues at Berkeley were continuing to push, back in the 90s). I just want to emphasize that some measure needs to be constructed, somehow.

“A hard case for Mister P”

Kevin Van Horn sent me an email with the above title (ok, he wrote MRP, but it’s the same idea) and the following content:

I’m working on a problem that at first seemed like a clear case where multilevel modeling would be useful. As I’ve dug into it I’ve found that it doesn’t quite fit the usual pattern, because it seems to require a very difficult post-stratification.

Here is the problem. You have a question in a survey, and you need to estimate the proportion of positive responses to the question for a large number (100) of different subgroups of the total population at which the survey is aimed. The sample size for some of these subgroups can be rather small. If these were disjoint subgroups then this would be a standard multi-level modeling problem, but they are not disjoint: each subgroup is defined by one or two variables, but there are a total of over 30 variables used to define subgroups.

For example, if x[i], 1 <= i <= 30, are the variables used to define subgroups, subgroup i for i <= 30 might be defined as those individuals for which x[i] > 1, with the other subgroup definitions involving combinations of two or possibly three variables. Examples of these subgroup definitions include patterns such as
· x1 == 1 OR x2 == 1 OR x3 == 1
· (x1 == 1 OR x1 == 2) AND x3 < 4.

You could do a multilevel regression with post-stratification, but that post-stratification step looks very difficult. It seems that you would need to model the 30-dimensional joint distribution for the 30 variables describing subgroups.

Have you encountered this kind of problem before, or know of some relevant papers to read?

I replied:

In your example, I agree that it sounds like it would be difficult to compute things on 2^30 cells or however many groups you have in the population. Maybe some analytic approach would be possible? What are your 30 variables?

And then he responded:

The 30+ variables are a mixture of categorical and ordinal survey responses indicating things like the person’s role in their organization, decision-making influence, familiarity with various products and services, and recognition of various ad campaigns. So you might have subgroups such as “people who recognize any of our ad campaigns,” or “people who recognize ad campaign W,” or “people with purchasing influence for product space X,” or more tightly defined subgroups such as “people with job description Y who are familiar with product Z.”

Here’s some more context. I’m looking for ways of getting better information out of tracking studies. In marketing research a tracking study is a survey that is run repeatedly to track how awareness and opinions change over time, often in the context of one or more advertising campaigns that are running during the study period. These surveys contain audience definition questions, as well as questions about familiarity with products, awareness of particular ads, and attitudes towards various products.

It’s hard to get clients to really understand just how large sampling error can be, so there tends to be a lot of upset and hand wringing when they see an unexplained fluctuation from one month to the next. Thus, there’s significant value in finding ways to (legitimately) stabilize estimates.

Where things get interesting is when the client wants to push the envelope by
a) running surveys more often, but with a smaller sample size, so that the total number surveyed per month remains the same, or
b) tracking results for many different overlapping subgroups.

I’m seeing some good results for handling (a) by treating the responses in each subgroup over time as a time series and applying a simple state-space model with binomial error model; this is based on the assumption that the quantities being tracked don’t typically change radically from one week to the next. This kind of modeling is less useful in the early stages of the study, however, when you don’t yet have much information on the typical degree of variation from one time period to the next. Multilevel modeling for b) seems like a good candidate for the next improvement in estimation, and would help even in the early stages of the study, but as I mentioned, the post-stratification looks difficult.

Now here’s me again:

I see what you’re saying about the poststrat being difficult. In this case, one starting point could be to make somewhat arbitrary (but reasonable) guesses for the sizes of the poststrat cells—for example, just use the proportion of respondents in the different categories in your sample—and then go from there. The point is that the poststrat would be giving you stability, even if it’s not matching quite to the population of interest.

And Van Horn came back with:

You write, “one starting point could be to make somewhat arbitrary (but reasonable) guesses for the sizes of the poststrat cells.”

But there are millions of poststrat cells… Or are you thinking of doing some simple modeling of the distribution for the poststrat cells, e.g. treating the stratum-defining variables as independent?

That sounds like it could often be a workable approach.

Just to stir the pot, though . . . One could argue that a good solution should have good asymptotic behavior, in the sense that, in the limit of a large subgroup sample size, the estimate for the proportion should tend to the empirical proportion. Certainly if one of the subgroups is large, in which case one would expect the empirical proportion to be a good estimate for that subgroup, and my multilevel-model-with-poststrat gives an estimate that differs significantly from the “obvious” answer, this is likely to raise questions about the validity of the approach. It seems to me that, to achieve this asymptotic behavior, I’d need to be able to model the distribution of poststrat cells at arbitrary levels of detail as the sample size increases. This line of thought has me looking into Bayesian nonparametric modeling.

Fun stuff.

Stroopy names

Baby Name Wizard is all over this one.

And this all makes me wonder: is there a psychology researcher somewhere with a dog named Stroopy? Probably so.

P.S. I just made the mistake of googling “Stroopy.” Don’t do it. I was referring to this.

Some quick disorganzed tips on classroom teaching

Below are a bunch of little things I typically mention at some point when I’m teaching my class on how to teach. But my new approach is to minimize lecturing, and certainly not to waste students’ time by standing in front of a group of them, telling them things they could’ve read at their own pace.

Anyway, here I am preparing my course on statistical computing and graphics and thinking of points to mention during the week on classroom teaching. My old approach would be to organize these points in outline format and then “cover” them in class. Instead, though, I’ll stick them here and then I can assign this to students to read ahead of time, freeing up class time for actual discussion.

Working in pairs:

This is the biggie, and there are lots of reasons to do it. When students are working in pairs, they seem less likely to drift off, also with two students there is more of a chance that one of them is interested in the topic. Students learn from teaching each other, and they can work together toward solutions. It doesn’t always work for students to do homeworks pairs or groups—I have a horrible suspicion that they’ll often just split up the task, with one student doing problem #1, another doing problem #2, and so forth—but having them work in pairs during class seems like a no-lose proposition.

The board:

Students don’t pay attention all the time nor do they have perfect memories; hence, use the blackboard as a storage device. For example, if you are doing a classroom activity (such as the candy weighing), outline the instructions on the board at the same time as you explain them to the class. For another example, when you’re putting lots of stuff on the board, organize it a bit: start at the top-left and continue across and down, and organize the board into columns with clear labels. In both cases, the idea is that if a student is lost, he or she can look up at the board and have a chance to see what’s up.

Another trick is to load up the board with relevant material before the beginning of class period, so that it’s all ready for you when you need it.

The projector:

It’s becoming standard to use beamer (powerpoint) slide presentations in classroom teaching as well as with research lectures. I think this is generally a good idea, and I have just a few suggestions:
- Minimize the number of words on the slides. If you know what you’re talking about, you can pretty much just jump from graph to graph.
- The trouble with this strategy is that, without seeing the words on the screen, it can be hard to remember what to say. This suggests that what we really need is a script (or, realistically, a set of notes) to go along with the slide show. Logistically this is a bit of a mess—it’s hard enough to keep a set of slides updated without having to keep the script aligned at the same time—and as a result I’ve tended to err on the side of keeping too many words on my slides (see here, for example). But maybe it’s time for me to bite the bullet and move to a slides-and-script format.

Another intriguing possibility is to go with the script and ditch the slides entirely. Indeed, you don’t even need a script; all you need are some notes or just an idea of what you want to be talking about. I discovered this gradually over the past few years when giving talks (see here for some examples). I got into the habit of giving a little introduction and riffing a bit before getting to the first slide. I started making these ad libs longer and longer, until at one point I gave a talk that started with 20 minutes of me talking off the cuff. It seemed to work well, and the next step was to give an entire talk with no slides at all. The audience was surprised at first but it went just fine. Most of the time I come prepared with a beamer file full of more slides than I’ll ever be able to use, but it’s reassuring to know that I don’t really need any of them.

Finally, assuming you do use slides in your classes, there’s the question of whether to make the slides available to the students. I’m always getting requests for the slides but I really don’t like it when students print them out. I fear that students are using the slides as a substitute for the textbook, also that if the slides are available, students will think they don’t need to pay attention during class because they can always read the slides later.

It’s funny: Students are eager to sign up for a course to get that extra insight they’ll obtain from attending classes, beyond whatever they’d get by simply reading the textbook and going through the homework problems on their own. But once they’re in class, they have a tendency to drift off, and I need to pull all sorts of stunts to keep them focused.

The board and the projector, together:

Just cos your classroom has a projector, that don’t mean you should throw away your blackboard (or whiteboard, if you want to go that stinky route). Some examples:
- I think it works better to write out an equation or mathematical derivation in real time rather than to point at different segments of an already-displayed formula.
- It can help to mix things up a little. After a few minutes of staring at slides it can be refreshing to see some blackboard action.
- You can do some fun stuff by projecting onto the blackboard. For example, project x and y axes and some data onto the board, then have a pair of students come up and draw the regression line with chalk. Different students can draw their lines, then you click onto the next slide which projects the actual line.


Paper handouts can be a great way to increase the effective “working memory” for the class. Just remember not to overload a handout. Putting something on paper is not the same thing as having it be read. You should figure out ahead of time what you’re going to be using in class and then refer to it as it arises.

I like to give out roughly two-thirds as many handouts as there are people in the audience. This gives the handouts a certain scarcity value, also it enforces students discussing in pairs since they’re sharing the handouts already. I found that when I’d give a handout to every person in the room, many people would just stick the handout in their notebook. The advantage of not possessing something is that you’re more motivated to consume it right away.

Live computer demonstrations:

These can go well. It perhaps goes without saying that you should try the demo at home first and work out the bugs, then prepare all the code as a script which you can execute on-screen, one paragraph of code at a time. Give out the script as a handout and then the students can follow along and make notes. And you should decide ahead of time how fast you want to go. It can be fine to do a demo fast to show how things work in real life, or it can be fine to go slowly and explain each line of code. But before you start you should have an idea of which of these you want to do.

Multiple screens:

When doing computing, I like to have four windows open at once: the R text editor, the R console, an R graphics window (actually nowadays I’ll usually do this as a refreshable pdf or png window rather than bothering with the within-R graphics window), and a text editor for whatever article or document I’m writing.

But it doesn’t work to display 4 windows on a projected screen: there’s just not enough resolution, and, even if resolution were not a problem, the people in the back of the room won’t be able to read it all. So I’m reluctantly forced to go back and forth between windows. That’s one reason it can help to have some of the material in handout form.

What I’d really like is multiple screens in the classroom so I can project different windows on to different screens and show all of them at once. But I never seem to be in rooms with that technology.


That’s “just in time teaching”; see here for details. I do this with all my classes now.

Peer instruction:

This is something where students work together in pairs on hard problems. It’s an idea from physics teaching that seems great to me but I’ve never succeeded in implementing true “peer instruction” in my classes. I have them work in pairs, yes, but the problems I give them don’t look quite like the “Concept Tests” that are used in the physics examples I’ve seen. The problem, perhaps, is that intro physics is just taught at a higher level than intro statistics. In my intro statistics classes, it’s hard enough to get the students to learn about the basics, without worrying about getting them into more advanced concepts. So when I have students work in pairs, it’s typically on more standard problems.


In addition to these pair or small-group activities, I like the idea of quick drills that I shoot out to the whole class and students do, individually, right away. I want them to be able to handle basic skills such as sqrt(p*(1-p)/n) or log(a*x^(2/3)) instantly.

Getting their attention:

You want your students to stay awake and interested, to enter the classroom full of anticipation and to leave each class period with a brainful of ideas to discuss. Like a good movie, your class should be a springboard for lots of talk.

But you don’t want to get attention for the wrong things. An extreme example is the Columbia physics professor who likes to talk about his marathon-fit body and at one point felt the need to strip to his underwear in front of his class. This got everyone talking—but not about physics. At a more humble level, I sometimes worry that I’ll do goofy things in class to get a laugh, but then the students remember the goofiness and not the points I was trying to convey. Most statistics instructors probably go too far in the other direction, with a deadpan demeanor that puts the students to sleep.

It’s ok to be “a bit of a character” to the extent that this motivates the students to pay attention to you. But, again, I generally recommend that you structure the course so that you talk less and the students talk more.

Walking around the classroom:

Or wheeling around, if that’s your persuasion. Whatever. My point here is that you want your students to spend a lot of the class time working on problems in pairs. While they’re doing this, you (and your teaching assistants, if this is a large so-called lecture class with hundreds of students) should

Teaching tips in general:

As I explained in my book with Deb Nolan, I’m not a naturally good teacher and I struggle to get students to participate in class. Over the decades I’ve collected lots of tricks because I need all the help I can get. If you’re a naturally good teacher or if your classes already work then maybe you do without these ideas.


It’s not clear how much time should be spent preparing the course ahead of time. I think it’s definitely a good idea to write the final exam and all the homeworks before the class begins (even though I don’t always do this!) because then it gives you a clearer sense of where you’re heading. Beyond that, it depends. I’m often a disorganized teacher and I think it helps me a lot to organize the entire class before the semester begins.

Other instructors are more naturally organized and can do just fine with a one-page syllabus that says which chapters are covered which weeks. These high-quality instructors can then just go into each class, quickly get a sense of where the students are stuck, and adapt the class accordingly. For them, too much preparation might well backfire.

My problem is that I’m not so good at individualized instruction; even in a small class, it’s hard for me to keep track of where each student is getting stuck, and what the students’ interests and strengths are. I’d like to do better on this, but for now I’ve given up on trying to adapt my courses for individuals. Instead I’ve thrown a lot of effort into detailed planning of my courses, with the hope that these teaching materials will be useful for other instructors.

Students won’t (in general) reach your level of understanding:

You don’t teach students facts or even techniques, you teach them the skills needed to solve problems (including the skills needed to find the solution on their own). And there’s no point in presenting things they’re not supposed to learn; for example, if a mathematical derivation is important, put it on the exam with positive probability. And if students aren’t gonna get it anyway (my stock example here is the sampling distribution of the sample mean), just don’t cover it. That’s much better, I think, than wasting everyone’s time and diluting everyone’s trust level with a fake-o in-class derivation.

The road to a B:

You want a plan by which a student can get by and attain partial mastery of the material. See discussion here.


What, if anything, did the students actually learn during the semester?

You still might want to evaluate what your students are actually learning, but we don’t usually do this. I don’t even do it, even though I talk about it. Creating a pre-test and post-test is work! And it requires some hard decisions. Whereas not testing at all is easy. And even when educators try to do such evaluations, they’re often sloppy, with threats to validity you could drive a truck through. At the very least, this is all worth thinking about.

Relevance of this advice to settings outside college classrooms:

Teaching of advanced material happens all over, not just in university coursework, and much of the above advice holds more generally. The details will change with the goals—if you’re giving a talk on your latest research, you won’t want the audience to be spending most of the hour working in pairs on small practice problems—but the general principles apply.

Anyway, it was pretty goofy that I used to teach a course on teaching and stand up and say all these things. It makes a lot more sense to write it here and reserve class time for more productive purposes.

One more thing

I can also add to this post between now and the beginning of class. So if you have any ideas, please share them in the comments.

On deck this week

Mon: Some quick disorganzed tips on classroom teaching

Tues: Stroopy names

Wed: “A hard case for Mister P”

Thurs: The field is a fractal

Fri: Replication Wiki for economics

Sat, Sun: As Chris Hedges would say: Stop me if you’ve heard this one before

My courses this fall at Columbia

Stat 6103, Bayesian Data Analysis, TuTh 1-2:30 in room 428 Pupin Hall:

We’ll be going through the book, section by section. Follow the link to see slides and lecture notes from when I taught this course a couple years ago. This course has a serious workload: each week we have three homework problems, one theoretical, one computational, and one applied.

Stat 6191, Statistical Communication and Graphics, TuTh 10-11:30 in room C05 Social Work Bldg:

This is an entirely new course that will be structured around student participation. I’m still working out the details but here’s the current plan of topics for the 13 weeks:

1. Introducing yourself and telling a story
2. Introducing the course
3. Presenting and improving graphs
4. Graphing data
5. Graphing models
6. Dynamic graphics
7. Programming
8. Writing
9. Giving a research presentation
10. Collaboration and consulting
11. Teaching a class
12-13. Student projects

Why am I teaching these courses?

The motivation for the Bayesian Data Analysis class is obvious. There’s a continuing demand for this course, and rightly so, as Bayesian methods are increasingly powerful for a wide range of applications. Now that our book is available, I see the BDA course as having three roles: (1) the lectures serve as a guide to the book, we talk through each section and point to tricky points and further research; (2) the regular schedule of homework assignments gives students a lot of practice applying and thinking about Bayesian methods; and (3) students get feedback from the instructor, teaching assistant, and others in the class.

The idea of the communication and graphics class is that statistics is all about communication to oneself and to others. I used to teach a class on teaching statistics but then I realized that classroom teaching is just one of many communication tasks, along with writing, graphics, programming, and various forms of informal contact. I think it’s important for this class to not be conducted through lectures, or guest lectures, or whatever, but rather as much as possible via student participation.