Skip to content

D&D 5e: Probabilities for Advantage and Disadvantage

D&D 1e Zombie

The new rules for D&D 5e (formerly known as D&D Next) are finally here:

D&D 5e introduces a new game mechanic, advantage and disadvantage.

Basic d20 Rules
Usually, players roll a 20-sided die (d20) to resolve everyting from attempts at diplomacy to hitting someone with a sword. Each thing a player tries to do has a difficulty and rolling greater than or equal to the difficulty (with various modifiers for ability and training and magic items) means the character was successful.

Advantage and Disadvantage
As of 5th Edition (5e) rolls can be made with advantage or disadvantage. The rules are:

  • Advantage: roll two d20 and take the max
  • Normal: roll one d20 and take the result
  • Disadvantage: roll two d20 and take the min

So what are the chances that you’ll roll equal to or above given number with advantage, normally, or with disadvantage? Here’s a table.

roll disadvantage normal advantage
20 0.002 0.050 0.098
19 0.010 0.100 0.191
18 0.022 0.150 0.278
17 0.039 0.200 0.359
16 0.062 0.250 0.437
15 0.089 0.300 0.510
14 0.123 0.350 0.576
13 0.160 0.400 0.639
12 0.202 0.450 0.698
11 0.249 0.500 0.751
10 0.303 0.550 0.798
9 0.361 0.600 0.840
8 0.424 0.650 0.877
7 0.492 0.700 0.910
6 0.564 0.750 0.938
5 0.640 0.800 0.960
4 0.723 0.850 0.978
3 0.811 0.900 0.990
2 0.903 0.950 0.998
1 1.000 1.000 1.000

The effect is huge. There’s less than a 9% chance of rolling 15 or higher with disadvantage, whereas there’s a 30% chance normally and a 51% chance with advantage.

Here’s a plot (apologies for the poor ggplot2() and png() defaults — I don’t understand ggplot2 config well enough to make titles, clean up labels, axes, tick mark labels, boundaries, margins, colors, and so on to make it more readable without spending all night on the project).

The vertical distances at a given horizontal position show you how much of a bonus you get for advantage or disadvantage.

[Update: there's an alternative plot on the Roles, Rules, and Rolls blog that displays the difference between advantage and a simple +3 bonus, as used in previous D&D editions.]

Analytic Solution
The probabilities involved are simple rank statistics for two uniform discrete variables.

You can compute these probabilities analytically, as I show in Part III of the page where I explain the math and stats behind my simple baseball simulation game, Little Professor Baseball.

The basic game is a cross between All-Star Baseball and Strat-o-Matic. I wanted a rolling system where both players roll, one for the batter and one for the pitcher, and you use card for the player with the highest roll to resolve the outcome. The winning roll is going to have a distribution like the advantage probabilities above (except that Little Professor Baseball uses rolls from 1–1000 rather than 1–20. As a bonus, earlier sections of the math page explains why Strat-o-Matic cards looks so extreme on their own (unlike the All-Star Baseball spinners).

I developed the game after reading Jim Albert’s most excellent book, Curve Ball, and I included the cards for the 1970 Cinncinnatti Reds and Baltimore Orioles (which for trivia, were mine and Andrew’s favorite teams as kids).

Simulation-Based Calculation
I computed the table with a simple Monte Carlo simulation, the R code for which is as follows.

oneroll <- function(D) {
advantage <- function() {
  return(max(oneroll(20), oneroll(20)));
disadvantage <- function() {
  return(min(oneroll(20), oneroll(20)));

NUM_SIMS <- 100000;

advs <- rep(0,NUM_SIMS);
for (n in 1:NUM_SIMS)
  advs[n] <- advantage();  

disadvs <- rep(0,NUM_SIMS);
for (n in 1:NUM_SIMS)
  disadvs[n] <- disadvantage();  

print("", quote=FALSE);
print("CCDF (Pr[result >= k])",quote=FALSE);
print(sprintf("%2s  %6s  %6s  %6s", "k", "disadv", "normal", "advant"),

cumulative_disadv <- 0;
cumulative_norm <- 0;
cumulative_adv <- 0;

for (k in 20:1) {
  cumulative_disadv <- cumulative_disadv + sum(disadvs == k) / NUM_SIMS;
  cumulative_norm <- cumulative_norm + 0.05;
  cumulative_adv <- cumulative_adv + sum(advs == k) / NUM_SIMS;
  print(sprintf("%2d  %6.3f  %6.3f  %6.3f",    
                cumulative_disadv, cumulative_norm, cumulative_adv),

I'm sure Ben could've written that in two or three lines of R code.

Hey—this is a new kind of spam!

Ya think they’ll never come up with something new, and then this comes along:
Continue reading ‘Hey—this is a new kind of spam!’ »

Chicago alert: Mister P and Stan to be interviewed on WBEZ today (Fri) 3:15pm

Niala Boodho on the Afternoon Shift will be interviewing Yair and me about our age-period-cohort extravaganza which became widely-known after being featured in this cool interactive graph by Amanda Cox in the New York Times.

And here’s the interview.

The actual paper is called The Great Society, Reagan’s revolution, and generations of presidential voting and was somewhat inspired by this story.

Here’s the set of graphs that got things started:

Screen Shot 2014-07-10 at 8.34.48 PM

Here’s a bit more of what we found:

Screen Shot 2014-07-10 at 8.36.17 PM

And here’s some more:

Screen Shot 2014-07-10 at 8.38.38 PM

I love this paper, I love Yair’s graphs, and I love how we were able to fit this complicated model that addresses the age-period-cohort problem. Stan’s the greatest.

Open-source tools for running online field experiments

Dean Eckles points me to this cool new tool for experimentation:

I [Eckles] just wanted to share that in a collaboration between Facebook and Stanford, we have a new paper out about running online field experiments. One thing this paper does is describe some of the tools we use to design, deploy, and analyze experiments, including the2012 US election voter turnout experiment. And now we have open sourced an implementation of these ideas.

We were inspired by Fisher’s quote — “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination.”

The idea is that one way to consult a statistician in advance is to have their advice built into tools for running experiments — a similar idea to how you emphasize the importance of defaults in data visualization tools.

We have a shorter blog post about this work, a paper, “Designing and Deploying Online Field Experiments” (to appear in Proc. of WWW very soon), and the software and documentation, PlanOut.

We’d be very interested in your thoughts on any of this, as perhaps would your blog readers. I also think many might be interested in using the software to run experiments themselves — that’s our hope!

Looks good to me. It’s great to see this sort of stuff out there, not just in textbooks but really getting used.

P.S. Brian Keegan writes in:

I’m a post-doc in David Lazer’s computational social science group at Northeastern. I noticed that you were going to discuss open-source tools for running online experiments on Thursday, so I wanted to offer a shameless plug for a platform that we’re developing here in hopes it might fit into the themes of your post.

Volunteer Science ( is a web laboratory platform for conducting group and network experiments that’s built on open standards and open code. We’re very much interested in recruiting others to develop experiments for the platform as well as expanding the number of users who volunteer.

“P.S. Is anyone working on hierarchical survival models?”

Someone who wishes to remain anonymous writes:

I’m working on building a predictive model (not causal) of the onset of diabetes mellitus using electronic medical records from a semi-panel of HMO patients. The dependent variable is blood glucose level. The unit of analysis is the patient visit to a network doctor or hospitalization in a network hospital aggregated to the month-year level. The time frame is from the early 80s to the present. Since my focus is on the onset of the disease, my approach is agnostic and prospective. I would like to derive data-driven answers to questions of co-morbidity, patient health and wellness based on physical measures such as BMI or BP as well as physician and hospital quality as an inherent part of the model output.

To me, addressing these issues with data of this type would require multiple models for full coverage:

1) A survival model to capture censoring and time to disease onset

– Censoring can have multiple causes: diagnosed with diabetes type 1 or 2, lost to followup, death, etc

2) Multiple hierarchical bayesian models for massively categorical variables such as patient, diagnosis, doctor, hospital to capture the differing dependence structures

– Patient within zipcode, community, county, state to capture the social determinants of health

– Patient within a family network, e.g., children, siblings, parents, etc., to reflect familial history of disease

– Patient and diagnoses received — thousands of possible diagnoses which collapse into higher levels

– Patient within HMO doctor and hospital network

– Doctor within specialty — probably 70 or so specialties overall

– Doctor within zipcode, community, county, state

– Hospital within zipcode, community, county, state

3) As available, the impact of programs and interventions designed to promote wellness, mitigate or prevent disease…these could include recommendations regarding exercise, diet, etc.

4) Given the wide time frame, macro-economic indexes to capture the well-known impact of the business cycle on the determinants of medically-related activities

These are preliminary thoughts as I have not yet begun the process of testing the need for specifying all of these hierarchies since I am still in the initial stages of the analysis. Just getting this data lined up and talking together is a significant challenge in and of itself.

My question for you concerns the need for multiple models when the dependence structures overlap and are as messy as in the present case. I’m sure you’re going to advise against such a wide-ranging predictive design, enjoining me to greater research focus and specificity. My preference is to retain an expansive and exploratory stance and not to simplify the in-going hypotheses just for the sake of the modeling. Honestly, I think that there is already too much specificity in the literature which does little or nothing to uncover and identify the broad antecedents of this illness.

What do you think? Am I missing something? Suggestions?

P.S. Is anyone working on hierarchical survival models?

My reply: It does sound kind of appealing to just throw everything into the model and let Stan sort it out. On the other hand, it also seems like the “throw it all in at once” strategy is a recipe for confusion, and it could be hard to interpret the results. So let me give you the generic suggestion that, whatever model you start with, you check it out using fake-data simulation (that is, simulate fake data from the model, then fit the model and check that you can recover the parameters of interest and make good predictions). And I’d suggest starting simple and working up from there. Ultimately I think a more complex model is better and should be more believable, but you might have to work up to it, because of challenges of computation, identification, and understanding of the model.

P.S. Matt Gribble adds:

I wanted to plug some exciting work by Michael Crowther extending generalized gamma regression to have random effects not only on the log-hazards scale (frailty models) but also on the log-relative median survival time scale. The paper’s in press at Statistics in Medicine, and I had nothing to do with it but can’t wait to cite/use it.

Not quite what I plugged (I was basing the plug about survival time random effects on slides I saw of his, not on the actual paper) but I think this ref is still cutting-edge stuff in the theme of hierarchical survival models.

Just wondering


It would be bad news if a student in the class of Laurence Tribe or Alan Dershowitz or Ian Ayres or Edward Wegman or Matthew Whitaker or Karl Weick or Frank Fischer were to hand in an assignment that is obviously plagiarized copied from another source without attribution. Would the prof have the chutzpah to fail the student, or would he just give the student an A out of fear that the student would raise a ruckus if anything were done about it?

But it would be really bad news if everyone in the class were to do this. For example, suppose the students were to do the work for real—that is, individually write their own, non-plagiarized papers and put them in a file somewhere—and then make alternative, plagiarized papers to hand in. Perhaps, just to make sure the prof or the overworked teaching assistant doesn’t miss it, the students would even cite the wikipedia entries they’re copying from. Then they’d sit back and wait and see what happens. It would be important, though, that the students write actual papers on their own, because otherwise they’d be missing out on the chance to learn the material, which after all is the real purpose of taking the course.

In any case, this would be a bad situation. It’s not clear how the prof would have the moral authority to fail a student for an offense that he, the professor, had committed without suffering any penalty. But I wouldn’t recommend the students try it. They might just get expelled anyway for the combined violations of plagiarism and embarrassing the university.

It’s like smoking crack in Toronto. It’s still illegal even if the boss does it.

“Bayes Data Analysis – Author Needed”

The following item came in over the Bayes email list:


My name is Jo Fitzpatrick and I work as an Acquisition Editor for Packt Publishing ( ). We recently commissioned a book on Bayesian Data Analysis and I’m currently searching for an author to write this book. You need to have good working knowledge of Bayes and a good level of written English. Please email for more details.


Joanne Fitzpatrick

Acquisition Editor
[ Packt Publishing ]

Hey, I think I’m qualified for this! Although maybe not the “good level of written English” bit, as I only speak American. . . . In any case, I am happy to see that the term “Bayesian Data Analysis” has become generic.

On deck this week

Mon: “Bayes Data Analysis – Author Needed”

Tues: Just wondering

Wed: “P.S. Is anyone working on hierarchical survival models?”

Thurs: Open-source tools for running online field experiments

Fri: Hey—this is a new kind of spam!

Sat, Sun: As Chris Hedges would say: That’s the news, and I am outta here!

Visualizing sampling error and dynamic graphics

Robert Grant writes:

What do you think of this visualisation from the NYT [in an article by Neil Irwin and Kevin Quealy but I'm not sure if they're the designers of the visualization]? I’m pretty impressed as a method of showing sampling error to a general audience!

I agree.

P.S. In related news, Antony Unwin writes:

A couple of weeks ago you had a discussion on graphics on your blog and it seemed to me that people had very different ideas about what the term “Interactive Graphics” means. For some it is about interacting with presentation graphics on the web, for others it is about using interactive graphics to do data analysis. You really need to see interactive graphics in action to get a feel for it.

I have made a ten minute film to give the flavour of interactive graphics for data analysis with data on last year’s Tour de France and using Martin Theus’s software Mondrian.

Grand Opening: The Stan Shop

I finally put together a shop so everyone can order Stan t-shirts and mugs:

The art’s by Michael Malecki. The t-shirts and mugs are printed on demand by Spreadshirt. I tried out a sample and the results are great and have held up to machine washing and drying.

There’s a markup of about $4 per item, which is going straight into the Stan slush fund. No promises that it will be spent wisely, but it will go to the developers.

There aren’t a lot of other products from Spreadshirt that we can put logos on — most of the items (hats, tote bags, etc.) are text-only. But if there are other t-shirts or sweatshirts people want, we could easily expand our product line — feel free to drop suggestions in the comment box.