Hype about conditional probability puzzles

Posted on May 27, 2010 9:03 AM by Andrew

Jason Kottke posts this puzzle from Gary Foshee that reportedly impressed people at a puzzle-designers’ convention:

I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?

The first thing you think is “What has Tuesday got to do with it?” Well, it has everything to do with it.

I thought I should really figure this one out myself before reading any further, and I decided this was a good time to apply my general principle that it’s always best to solve such problems from scratch rather than trying to guess at the answer.

So I laid out all the 4 x 49 possibilities. The 4 is bb, bg, gb, gg, and the 49 are all possible pairs of days of the week. Then I ruled out all the possibilities that were inconsistent with the data: this leaves the following:

bb with all pairs of days that include a Tuesday. That’s 13 possibilities (Mon/Tues, Tues/Tues, Wed/Tues, …, Tues/Mon, …, Sun/Tues, remembering not to count Tues/Tues twice).
bg with all the Tues/x pairs: that’s 7 more possibilities.
gb with all the x/Tues pairs: that’s 7 more.

Next, the assumptions. I decided I’d keep it simple:
– Ignore multiple births.
– Pretend that boys and girls are equally likely. (Regular readers know that Pr (girl) = .485.)
– Pretend that births are equally likely on every day. (Actually, I seem to recall reading that they’re more common on Fridays and less so on weekends.)

Pr (bb | available information) is then 13/27.

But then I thought, Hey, he said, “One is a boy born on a Tuesday.” He didn’t say At least one. So I’ll toss out bb Tues/Tues, which leaves us with 26 possibilities and a conditional probability for bb of 12/26.

Having solved the problem to my satisfaction (but with a bit of a worry that I was missing something important), I followed the link to a news article by Alex Bellos, who gives the answer as 13/27. So I guess when Foshee said “One,” he meant, “At least one.”

Perhaps the best comment on all of this, however, is from Todd Stark, who writes:

Doesn’t this illustrate limits to the value of probability? It seems like more than a curiosity to me. If specifying a logically irrelevant detail changes the probability calculation, doesn’t that tell us that probability thinking is a relatively useless tool in situations like this? It is implicit that everyone is born on a particuar day, if specifying something we already knew changes the calculation, isn’t the calculation unreliable for decision making, for this class of situations?

Good question. I certainly believe that probability theory is the right mathematical tool for solving probability problems–as a statistician, I guess it’s no surprise that I feel this way–but, given how difficult it is to solve such problems in one’s head, it’s hard to see this as a useful model for regular decision making.

The interesting question, I think, is how often do these sorts of tricky conditional probability problems arise in real life. I don’t know the answer. (That is, I’m not trying to raise a rhetorical question and claim that these problems don’t arise in real life. What I’m saying is that I don’t know and would be interested in seeing how to think systematically about the question.)

P.S. Bellos’s article was fine, but I wish he’d remarked that these conditional probability examples are textbook problems in introductory probability courses.

P.P.S. I agree with the many commenters who point out that, really, the information to condition on is not “Foshee has two children. One is a boy born on a Tuesday,” but, rather, “Foshee says, ‘I have two children. One is a boy born on a Tuesday.'” So really you need a model for what Foshee might be saying.

That’s one reason I’m not a big fan of this sort of trick probability question: some of the most important parts of the problem are hidden, and the answer is typically explained in a way that avoids making clear the assumptions that are needed to get there.

83 thoughts on “Hype about conditional probability puzzles”

BrendanH on May 27, 2010 5:58 AM at 5:58 am said:

Let child A be the Tuesday boy.

Child B presents 14 possibilities, all equally likely.

Since child A is fixed, that is 14 equal probabilities for the combination, half of which are female.

Thus p(BB) = 7/14.

Why does this give a different result? I see no advantage in letting the ordering vary.
BrendanH on May 27, 2010 6:10 AM at 6:10 am said:

Second thoughts. The ordering matters, and this is similar to the Monty Hall problem. Instead of 3 doors we have 4 equally likely ordered outcomes, BB, BG, GB and GG, and we have "opened a door" by excluding GG.
James H on May 27, 2010 6:37 AM at 6:37 am said:

I am inclined to Brendan's point of view regarding the unimportance of the ordering. It is precisely this decision to allow the order to vary that results in the answers being different. Here's how:

In allowing the orders to vary Andrew has to discard the additional BB Tues/Tues pair. Consequently, the result is 13/27 rather than the 14/28 that results if the duplicate pair is included.

Andrew, is there a case to be made for why the order should be considered important?
Pat on May 27, 2010 6:50 AM at 6:50 am said:

Good question. I certainly believe that probability theory is the right mathematical tool for solving probability problems

I think you answered the wrong question here. I think the point was that if probability theory requires you to take into account clearly extraneous information, then there's something wrong.

If I have one boy and he's wearing green socks, then in order to quantify the probability of a second boy, do I need to know how many colors of socks the manufacturer makes?

As they say often on the blog Science-Based Medicine, you need a correlation AND a plausible causal relationship. There are many things about probability theory that are counter-intuitive, but irrelevant information?
Jens on May 27, 2010 7:06 AM at 7:06 am said:

Why would you count BB Tues/Tues only once? If you count b1Tues/b2Mon and b2Mon/b1Tues as two possibilities you should do the same for for b1Tues/b2Tues and b2Tues/b1Tues.

So you get the expected result of equal probability, and have no weekday influence.
Bill Jefferys on May 27, 2010 7:17 AM at 7:17 am said:

I read it as "at least one is a boy born on Tuesday" so got 13/27.

I must say that Todd Stark's comment is very interesting. I'm not sure that it is troubling, however. The puzzle is why the information about Tuesday is significant when it seems intuitively not to be.
Bill Jefferys on May 27, 2010 7:28 AM at 7:28 am said:

I thought that the two comments by wtl (near the end) were quite perceptive.
Andrew Gelman on May 27, 2010 8:18 AM at 8:18 am said:

Jens (and others): You just have to work it out according to the laws of probability. Under the (simplified) assumptions, each of the following events has probability 1/49: Mon/Tues, Tues/Tues, and Tues/Mon. It's the same thing as why you're more likely to get 7 than 6 when you roll two dice.
wcw on May 27, 2010 8:46 AM at 8:46 am said:

If you survey all two-child families, what is the incidence of two boys among all those with at least one boy born on Tuesday?
Tom Fid on May 27, 2010 9:01 AM at 9:01 am said:

Suppose you simplify to make both dimensions binary by replacing day of week with Tuesday or notTuesday, or restate as:

I flipped two coins. At least one was a penny that came up heads. What's the chance that I got two heads?

Then it's easy to diagram the possibilities, {hp,hn,tp,tn} x {hp,hn,tp,tn} (with p = penny, n = not). Then, regardless of the distribution of penny vs. not, it's fairly easy to see that knowing at least one is a heads-penny is different from knowing that at least one is heads, because those conditions rule out different parts of the matrix.

I suspect that naive intuition would frequently be wrong even without the seemingly irrelevant day of week information. In other words, if you said, "I flipped two coins; at least one came up heads. What's the chance that I got two heads?" most people would expect 50-50, as if you'd asked, "I flipped two coins; the first one came up heads. What's the chance that I got two heads?"

While it's easy to draw a picture, I still find it hard to put what's happening into words.
Tom Fid on May 27, 2010 9:04 AM at 9:04 am said:

Isn't it more like why you're more likely to get 3 than 2?
Todd Graves on May 27, 2010 9:31 AM at 9:31 am said:

Andrew, I think your answer is good but not quite complete in the sense that you haven't fully specified the assumptions that make your answer correct. Specifically, we don't know the design of the experiment that led to the data given by the statement 'One is a boy born on a Tuesday.' I think this is an interesting statistics puzzle (since we should think hard about what our data mean, and construct a model for the data generating mechanism) and that referring to it as a conditional probability puzzle glosses over some of its possibilities.

(A) Suppose I gave Gary the instructions "Gary, go up to Andrew and tell him how many children you have. If you have one or more sons born on a Tuesday, say 'One is a boy born on a Tuesday.' Otherwise say whatever you want." Then your answer of 13/27 is correct (under the assumptions that yield 4×49 equally likely possibilities a priori). Slightly different instructions lead to the 12/26 version.

(B) However, suppose I gave Gary the instructions "Gary, go up to Andrew and tell him how many children you have. If you have any sons, let X be the day of the week on which your eldest son was born, and say 'One is a boy born on a X.'" Then in fact Tuesday has nothing to do with it, and the answer is 1/3. Right?

One can come up with more pathological instructions for Gary that lead to many other answers. I agree (based on my experience with the way characters in puzzles like this behave) that experiment (A) is most likely to be the experiment Gary is participating in, but others may think that Gary is likely to be operating under (B).
ck on May 27, 2010 9:35 AM at 9:35 am said:

"Mon/Tues, Tues/Tues, and Tues/Mon. It's the same thing as why you're more likely to get 7 than 6 when you roll two dice."
Can anyone point to a intuitive explanation of this? I mean, I have accepted it as a fact of combinatorics but my mind rebells each time when I encounter such a problem.
Phil on May 27, 2010 9:39 AM at 9:39 am said:

I like this a lot because even though I sort of understand it, I don't really understand it. I mean, I can work it through a la Andrew, but it still doesn't really make sense to me. It's a funny feeling.
Michael Margolis on May 27, 2010 9:40 AM at 9:40 am said:

We should update the "best comment" award to go to wtl, for "These Comments Are Missing The Point", which explains the puzzle quite well. The crux is, the additional information is removing part of the sample space and it is a part in which the events with girls are underrepresented. Whoever said it is a Monte Hall variant was right, although I thought not at first, since in his problem Monte knows something the player does not.

The really interesting question these puzzles raise is why our intuition is so wrong. For the question without a Tuesday, what we all want to say is 50-50. It seems we add an ordering to the question. We hear "one is a boy" which is the same as "the first one I've been told about is a boy" and so we think it means "the first born is a boy".

Is this caused by English syntax? Human cognition? An efficient algorithm for most problems likely to be shared by every intelligent species in the universe…?
Steven H. Noble on May 27, 2010 9:51 AM at 9:51 am said:

One thing to think about with these conditional probability puzzles is how the information is arrived at affects the final probability. For example, say you ring the doorbell of a home that you know has two children, and a boy answers the door. If you know by the neighbourhood that a son will always answer the door if there is one then the chances of two boys is 1/3. However, if this is known to be a more egalitarian neighbourhood where whoever is closer to the door answers it then the chance of two boys goes up to 1/2 (accounting for the necessary caveates that you noted in the post).

This example touches on Todd's comment, but doesn't fully resolve it. Just as every child is born on a day of the week, every child who answers a door also has a gender. How you found out that day matters.

With this in mind consider this slightly different story. Say all the boys who come from homes with two children are given a lottery ticket. Each ticket has a 1/7 chance of winning. You see a house has at least one winner (say winning means the house gets a free paint job, so by the frest paint you know there is a winner but not how many). Obviously you know that this house contains at least one boy. But you also know that the houses that contained two boys are more likely to contain a winner since they have two tickets. So the two boy houses should be over represented amongst the winners. This accounts for the 13/27 chance of two boys instead of 1/3.

Does that help? Is there a flaw?
Pat on May 27, 2010 9:56 AM at 9:56 am said:

I think the problem is the vagueness of the problem statement. 13/27 is the answer to the question, "What is the probability that I have two boys, at least one of whom was born on Tuesday?" And 1/2 is the answer to the question, "Given this boy born on Tuesday wearing green socks, what is the probability of this other child being a boy?"
Paul on May 27, 2010 10:00 AM at 10:00 am said:

What if the speaker's children had both been born on Mondays? From the setup of this calculation we're assuming he'd have said "I have two children. Neither is a boy born on a Tuesday." From my perspective the more likely statement would have been "I have two children. One is a boy born on a Monday". If he would have said the former the fact is important. If he would have said the latter, 'Tuesday' doesn't matter.

To get 13/27 you have to assume the statements are independent of what happened.

Imagine, instead, that I ask someone these questions: Do you have two children? Is one a boy born on Tuesday? Then we get 13/27, but it makes sense: if he's got two boys, it's more likely one is born on a Tuesday then if he only has one. I had no a priori reason to ask about Tuesday, so the fact that it was true is significant.

The puzzle requires you to make a specific and unusual assumption about the negative case, hence it feels unintuitive. If we assume this statement would only be made in the positive, we need to ignore all the matrix grids where there is no Tuesday (and possibly where there is no boy) and suddenly we get the results we expected: the date no longer matters.
Nick on May 27, 2010 10:14 AM at 10:14 am said:

The comment is interesting, but further thought makes me think that this shows how formalized probability is even more important, not less.

Thinking about the simpler problem, where we don't specify the "born on Tuesday" part, and just say they have at least one boy, the probability is 1/3. The reason that we want the probability to be 1/2 is because, in English, the difference between saying "I have looked at the family and they have at least one boy" and "I have looked at one of the children and he is a boy" is relatively small, and it is possible that the person who says one really means the other. The value of probability is that it requires you to say either 1) w is an element of {(B,G), (G,B), (B,B)} or 2) w is an element of {(B,G), (B,B)}. Very different. And if you say w is an element of {(BTues,GMon), (BTues,GTues), … ,(BTues,BSun)} then again the answer is made intuitive again.

The reason why these problems might not occur often in the real world is because no one would say "I have looked at the family and they have at least one boy," (unless they were trying to be deliberately obtuse), they would say "The first born is a boy" or other version of the second statement, allowing us to use our intuition appropriately.

Where this comes into play is when the natural thing to say is in conflict with the intuition. For example, with the classic question about "a medical test is x% correct, and a disease impacts y% of the population, what is the probability that someone who tests positive actually has the disease?", wouldn't it be so much better, when communicating with laypeople, to say "If this tests positive there is a 40% chance you have the disease and if it tests negative there is a 0.005% chance that you have the disease" and never mention x%?
jonathan on May 27, 2010 10:57 AM at 10:57 am said:

I don't find puzzles like this interesting but they do show how phrasing matters; it both illuminates and hides whatever is actually going on. The reason I don't like these puzzles is that people think all statistics is just a game and that there actually is no underlying reality, just a way of phrasing the game.

I can give a million examples but the degree of mistrust is one of the first hurdles. For example, given the hubbub about illegal immigration and crime, I spent a half hour checking police crime statistics using actual police websites for a bunch of locations – such as Phoenix, Tucson & Mesa, AZ (the 3 largest cities) and El Paso & Laredo, TX. It was pretty darned obvious, without doing anything more than adding and subtracting, that crime is down, that crime hit a high point in the mid-1990's and that it is down all over the place in every category. The newspaper reports I found seem to have accurately reported that crime in Tucson is below 1980 levels and below 1963 levels in Mesa. El Paso, for example, reports monthly and crime in 2010 is the same or lower than in 2009 in 4 out of 5 police commands (and down in all the commands along the border). The very first responses to this mere reporting of numbers taken verbatim from the police was that statistics are just manipulated numbers, that I was playing a "game" with them to make a "leftist" point. You can see why I don't much like games that reinforce this perspective. Sure, they can be fun for exercising close reading skills but they give the wrong impression of reality.
It's not the he on May 27, 2010 11:13 AM at 11:13 am said:

What I find most interesting about the problem is how it compares to its traditional Monty Hall variant. I.e., I have two kids, one is a boy. What is the probability that the other one is a boy as well?

In this problem the answer is 1/3…

Which is counterintuitive to someone who's never seen it, sure.

But then you introduce the bit about Tuesday, and the probability becomes 13/27.

Isn't that kind of crazy?
Andrew Gelman on May 27, 2010 11:22 AM at 11:22 am said:

Jonathan:

Yes, I agree. I prefer transparency–and, when the correct answer is not transparent (this happens sometimes!), a clear explanation for what happened. When it comes to conditional probability, I prefer examples such as Pr (have disease | test positive), which are a bit more realistic than these boy-born-on-Tuesday problems.

On the other hand, the trick problems can be helpful in highlighting the implicit assumptions underlying many of our analyses (see my P.P.S. above).
Filip on May 27, 2010 11:47 AM at 11:47 am said:

No. You're only not getting 1/2 because you count Tue/Tue twice for boy-girl combo but not boy-boy combo. And you're only doing that because you're labeling b and g which means bg and gb are different but bb and bb are same. If you label the b's as b1 and b2, e.g., you'd have b1b2 and b2b1 being different, thus allowing you to count Tue/Tue twice for two boys as well, thus giving you 1/2 probability. And why should you label them b1/b2? To account for whom is born on the Tuesday, the older or younger.

Either that or throw the second Tue/Tue out for bg/gb as well, and then you also get 1/2.
MikeB on May 27, 2010 12:04 PM at 12:04 pm said:

Why should the date be considered ancillary? Introducing the date discretizes the possible outcomes and allows for redundant states. In this case, breaking up the births by days of the week creates the two redundant states (Boy_{1} and Tuesday, Boy_{2} and Tuesday) and (Boy_{2} and Tuesday, Boy_{1} and Tuesday). The unintuitive result just comes from the need to have only one description of that system, leaving 14 – 1 = 13 bb possibilities and a probability of 13 / 27 instead of the naive 14 / 28 = 1 / 2. If the description is made in finer intervals (days of the year, etc) then the result approaches 1 / 2, saturating only when births are recorded by the instant and the probability of a redundant state goes to zero.
J Smith on May 27, 2010 12:16 PM at 12:16 pm said:

The key with Tuesday is that is instead of having the element of interest "boy" being groups together into "boy-boy" pairs, you now have Gender/Day pairs for which in the boy-boy domain the element of interest (boy/Tuesday) only co-occurs 1 out of the 7 pairings.
K? O'Rourke on May 27, 2010 12:26 PM at 12:26 pm said:

Phil and Andrew – maybe getting use to math versus understanding it.

This is my quick R code "math"

pp=paste(sort(rep(c("b","g"),7)),rep(1:7,2))
oo=outer(pp,pp,FUN = "paste")
oo=as.vector(oo)
length(oo)
cc=oo[grep("b 2",oo)]
length(cc)
length(cc[-grep("g",cc)])

easy to see the answer but not much cause to believe it (though some of the 25+ comments may help some)

K?
js on May 27, 2010 12:47 PM at 12:47 pm said:

Todd Graves has the right answer. Unless we know the mapping between the state of the world and what is reported to us, we cannot answer this question.
afinetheorem on May 27, 2010 4:56 PM at 4:56 pm said:

Two comments….

First, as someone alluded to, the reason Tuesday helps you is that the probability there is a son born on Tuesday is an increasing function of the number of boys. If I ask you, "Do you have at least one son?" and you say yes, then I know the probability of two sons given two children is 1/3. If I then ask you, "Do you have a son born on Tuesday", then "yes" means that you are more likely than not to have two sons (pr = 1/7 if you have one son, pr ~ .265 if you have two sons). So the probability of having two sons given one son is born on Tuesday is .265/(2/7)+.265=.481=13/27. As some have noted, analagous to Monty Hall, this probability is contingent on the father *always* saying "I have or don't have a son born on Tuesday".

Second, the probability of two sons given one son is 1/3, not 1/2, as contrary to what many commenters above have stated.
mike on May 27, 2010 5:41 PM at 5:41 pm said:

The state of the world has nothing to do with it. The probability is based on the information provided; in that sense the probability is subjective since it is based on the information a party has.

In the two situations Graves describe, Andrew has the same amount of information in both, while Todd has less information in case B. In all cases Gary has the most information, and for him the probability must be either 0% or 100%, depending on if he has both boys or not.

So three different people, with different sets of information, resulting in three different probabilities, all based on the same reality.
cynicismsyndicate on May 27, 2010 6:13 PM at 6:13 pm said:

I still don't understand where the idea that the birth order matters originates in this problem. if I were to say to you, "Balls come in red and blue, and the probability that any ball is red or blue is .5. I have two balls. (at least) One is red. What is the probability that I have two red balls?" the probability is 1/2, not 1/3, because one ball is given, making the second ball a functionally independent probability. I have not given you any reason to consider order.

This problem is not a variant of the Monty Hall Problem for that reason- in the MHP, there is a stated order of events. Choose a door, eliminate one, choose to keep or switch. In this case there is no stated or implied order, and therefore no reason to assume that BG and GB are any different. if we do assume they are, then B1B2 and B2B1 on tuesday must be considered different. Is there a reason from probability theory that demands we assume birth order matters when it isn't stated? Why set up the matrix BB, BG, GB, GG? Why not set it up BG, BB?
FH on May 27, 2010 7:11 PM at 7:11 pm said:

A previous post about false positive mamograms talked about doing thought experiments with various powers of 10. So …

Perform the following thought experiment. Buy 200 white styrofoam balls from the local hobby shop. Flip a coin — heads means paint a ball red, tails means paint a ball blue. After every 2 flips you have 2 colored balls. Place them in bins labeled {RR},{RB},{BB} where the order doesn't matter in the {..} notation. If you want to keep track of the order label 4 bins as (RR),(RB),(BR),(BB). Flip the coin 200 times. You will have roughly 25 {RR}, and 25 {BB}. If you agree with that, then you must agree that there will be roughly 50 {RB}.

This would be the same result if you have an infinitely large basket with half R balls and half B balls. Scoop up 2 balls at a time 100 times. Again, roughly you'll have 25 RR, 25 BB and the rest (50) RB.

Then in the statement of the problem, the words "At least 1 ball is R" means that you exclude the roughly 25 {BB} from consideration. So there are roughly 75 cases with at least 1 R. So the {RR} represent 1/3 of the cases under consideration.

So to answer your other question, about why order matters, I think it in implicit in generating the sample space.
FH on May 27, 2010 7:46 PM at 7:46 pm said:

regarding why order or labeling of the Red or Blue balls matters …

At the risk of confusing the heck out of folks, I think ultimately this is because the balls (or children) are distinguishable and can be labeled. If you look up the wikipedia link for particle statistics, it talks about how for classical systems the "particle" can be labeled.

But for quantum systems, the "particles" or balls can not be labeled, which means in that case the relevant states are really {BB},{RB},{RR}, and if it happened that the energy of the 3 states were the same, then they would be equally likely — sample space would have equal probability among those 3 states, instead of equal among 4 states(in the classical case). For example the property B or R relevant in describing a particle's quantum state might be the particle's spin.

Presumably the design of certain conveniences of our modern world rely on this — answer to (some variant of)the question "given that one photon is in state B, what is the chance that both are in state B?" is 1/2,i.e. the quantum answer and not 1/3, the classical answer. Not being an engineer I can't give a ready example.
It's not the he on May 27, 2010 9:18 PM at 9:18 pm said:

A small generalization: Suppose

Pr(event A|BG)=x
Pr(A|GG)=0
Pr(A|BB)=1-(1-x)^2

Then,

Pr(BB|A)= (1-(1-x)^2)/(1-(1-x)^2+2x), which is a decreasing function of x

So if x=1 we have P(2 boys|A)=1/3, and as x tends to 0 we have P(2 boys|A)=1/2. In the case at hand, the event A is "boy is born on Tuesday") and for x=1/7 the answer is 13/27.

So it would seem that one way to maximize the probability of getting the answer right, is to be given information that is independent of our kids' genders and completely false (A=the moon is made out of cheese?).
Erik Peter on May 27, 2010 10:22 PM at 10:22 pm said:

Seems like affinetheorem has really said all that needs to be said. Having a boy makes "gg" impossible. Having a boy born on a thursday is much more likely with "bb" than with "bg" or "gb". Interesting that you didn't work it out with Bayes.
Steve Sailer on May 28, 2010 12:13 AM at 12:13 am said:

I lost interest in puzzles about the time I became interested in statistics.

Why spend time on artificial constructs when there is a world of patterns out there that really exist that are not yet understood.
B. 'Nary' on May 28, 2010 12:15 AM at 12:15 am said:

Hi. An excellent puzzle. (I could not locate the comment by wtl that a couple others have referred to appreciatively)

The likelihood is more generally (2n-1)/(4n-1), where n is the no. of possible ' states' in the added piece of data. So it's 7 when a Day of the week is given – hence 13 / 27. If it was , say, the Waxing / Waning fortnight – then it's 3 / 7. Both are quite close to 0.5

When nothing is specified, then n=1. In that case it is 1 / 3 … which seems like a huge difference!

Nary
Mark Palko on May 28, 2010 1:43 AM at 1:43 am said:

Martin Gardner
(October 21, 1914 – May 22, 2010)
K? O'Rourke on May 28, 2010 4:12 AM at 4:12 am said:

Erik Peter – Bayes _theorem_ is just conditional probability.

Here there is a joint distribution on possible quadruples and partial information on the outcome – so condition on that (best explained? as removing those quadruples that you now know cannot have happened and renormalize).

I have no idea though if it would be illuminating in any way to split this into a prior and likelihood …

K?
K? O'Rourke on May 28, 2010 4:17 AM at 4:17 am said:

mike – different representations trying to possibly represent the same reality and all representations are fallible (and hopefully get less wrong over time)

Liked the point you made – just trying to highlight it somewhat

K?
Erik Peter on May 28, 2010 4:47 AM at 4:47 am said:

It is illuminating because the result is not intuitive for many people. Working with Bayes Theorem shows what seems to me to be the central point : Having a boy born on a thursday is simply more likely with two boys than with one. If you use the Theorem of Bayes this very fact appears clearly on the right side of the equation.

Counting and discarding outcomes is based on the same principle of course, but I see key to understanding how this little riddle works more clearly the other way.
cynicismsyndicate on May 28, 2010 6:42 AM at 6:42 am said:

Thanks, @FH, i get the computation- maybe my problem in understanding is linguistics. How is saying:

"I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"

functionally any different from saying:

"I have a child. What is the probability is it a boy?"

it's not a conditional probability question, it's a single probability question. One child in the problem is given with probability=1. the only reason you assume conditional probability is because there are two stated outcomes in the problem, C1= B or G and C2 = B or G. But one of those outcomes is given, C1=B and (I assume) has no causal relationship to the sex of the other child. Therefore, there is really only one outcome in question and it is independent.

I fail to see how it is a conditional probability problem: there is no condition.
FH on May 28, 2010 7:00 AM at 7:00 am said:

One hopes this picture is worth 1000 words.
Bill Jefferys on May 28, 2010 7:20 AM at 7:20 am said:

The two comments by wtl is still there, but you have to click on page 2 to find them. Or click on "show all comments" at the top of the comments section.
J Smith on May 28, 2010 8:09 AM at 8:09 am said:

cynicismsyndicate – You are not pulling balls in the current example but rather ball pairs. In other words we are asking "out of all fathers with two children and at least one boy born on Tuesday what percent have two boys" (13/27). We are not asking "out of all boys born on tuesday what is the probability the sibling is a boy" (1/2).

In the first case it matters whether the elements (e.g. Tuesday boys) appear in pairs (if they are in pairs then two of the object counts only under one father). In the second case independence holds just as you say.
Joe Lucke on May 28, 2010 9:20 AM at 9:20 am said:

If P is the probability of having a boy (or any other event) and R is the probability of being born on Tuesday (or any other independent event),
then the probability Q of my having two boys given that I have one born on a Tuesday is
Q=(2P-PR)/(2-PR). If R=1/7, then Q=13/27. If R=1 (born any day) Q=1/3. If R=0 (born on Feb 29, 2000), Q=1/2.
Matt Shotwell on May 28, 2010 1:35 PM at 1:35 pm said:

This slashdot article about "scientific impotence" made me think of some of the comments posted above. When probability runs counter to our preconceptions, we immediately question whether probability is relevant in this simple problem. This is clearly a case of "probabilistic impotence".
Michael Tobis on May 28, 2010 2:22 PM at 2:22 pm said:

P(BB | Btu)
= (P (Btu | BB) P (BB) ) / P (Btu)
= ( (13 / 49) (1/4) ) / (27 / 198 )
= 13 / 27

This is, however, a version of the Monty Hall problem. Your friend KNOWS what day one of his sons was born, just as Monty knows where the prize is and where the goat is.

So there is a difference, given that he posed the question. If you were allowed to specify the day of the week at random (and he were required to know the birth day-of-week of all sons) then Btu gives you some information. But since Monty gets to CHOOSE which door, and your friend gets to CHOOSE which day of the week, the situation differs. Here, intent matters, and the problem reduces, at your friend's intent to

P( BB | B ) = 1/3

Monty has revealed more information because he does not choose randomly.

Suppose your friend had said "What is the probability that I have two boys given that I have no son born on a Tuesday?" With certainty, there is a day of the week that he might choose on which no son was born. So if he chooses deliberately, he has given you no information, and

P( BB | 1) = P( BB ) = 0.25

whereas if he chooses randomly, or you get to specify the excluded day of the week, the answer is

P (BB | ^Btu) = (P (^Btu) | P (BB) ) / P (^Btu)
= (P (^Btu | BB) P (BB) ) / P (^Btu)
= ((36 / 49) (1/4 )) / (171 / 198)
= 36 / 171

Similarly, if Monty chooses randomly and exposes a goat, he gives you no information, but if you know he chooses deliberately to expose a goat, your odds of winning the car double if you switch.

However, it's a bit different. Monty is pretending to give you no information, but is in fact giving you information about your first guess. Your friend is pretending to give you information, but in fact is giving you none!

In practice, my answer is 1/3 if the person asking the question is indifferent to whether you get it right.

If the person actually requires you to guess whether it is two boys or one of each, and is not indifferent to your selection, matters get very complicated…
Irina Popov on May 28, 2010 5:27 PM at 5:27 pm said:

The event "boy-Tuesday and girl-Monday" (in this order) is the same as "girl-Monday and boy-Tuesday", therefore some gender and day pairs should not be counted twice. I haven't worked out the probabilities yet. What do you think?
K? O'Rourke on May 29, 2010 6:24 AM at 6:24 am said:

Erik Peter – yes illuminating but illumination is in the mind of the yet to be illuminated

My guess is it would be more direct for most to point out the different percentages being thrown out

R code "math"

pp=paste(sort(rep(c("b","g"),7)),rep(1:7,2))
oo=outer(pp,pp,FUN = "paste")
oo=as.vector(oo)
length(oo)
throw.away=oo[-grep("b 2",oo)]

foo=function(x)
paste(substr(x,1,1),substr(x,5,5))

table(foo(throw.away))/table(foo(oo))

K?
pr on May 29, 2010 9:11 AM at 9:11 am said:

wish to offer my 1 cent (the US is after all facing deflation). At first, I thought the probability was obviously one-half.

Wtl's explanation over at New Scientist was very, very illuminating, which reminded me of a common probabilistic fallacy described by John Allen Paulos in Innumeracy that involves the probability of general compared to specific outcomes: "some…event is likely to occur, whereas it's much less likely that a particular one will," such is the nature of conditional probability.

So, while the probability that any one child is a boy equals .5, the prob for a specific boy (red hair, blue eyes, given a brother born on Tuesday) is less than .5. While the question at first appears mis-specified and requires the solver to make assumptions, the "boy on Tuesday" problem is really a traditional, if not obscure, conditional probability.
Rhian on May 29, 2010 1:12 PM at 1:12 pm said:

Assuming that everyone is happy with the following:

Q1: What is the probability that both children are boys given that the older child is a boy?
A1: 1/2

Q2: What is the probability that both children are boys given that at least one is a boy?
A2: 1/3

Then what is the intuition behind the following?

Q3: What is the probability that both children are boys given that at least one is a boy born on a Tuesday?
A3: 13/27

Intuition:
Q3 is "between" Q1 and Q2. In Q1 we identify which child is the boy; in Q2 we don't. In Q3, we come close to identifying which child is a boy by saying "it is the one born on a Tuesday". If the other child was also born on a Tuesday (quite unlikely), then we are in the Q2 situation; otherwise (more likely) we are in the Q1 situation. Thus A3 is close to A1, but shifted slightly towards A2.
John Emerson on May 30, 2010 6:30 PM at 6:30 pm said:

I'm the kind of person whose intuition people are talking about when they say that these puzzles are counter-intuitive. I finally figured it out with quite a bit of help

I think that a lot of difficulty comes for the specific verbal presentation chosen. The series of questions is actually not something that you'd ever find in real life, but it's been tweaked to seem almost like it is, so that people use their normal commonsense ways of dealing with everyday situations to deal with a carefully tweaked non-everyday situation. (How often would you ask "Is there a son born on Tuesday?" Never except in logic puzzles).

Suppose the following three questions were asked. Except for the first they're also not the kinds of questions that would ever really be asked,* and even more obviously so, but the way they're asked the puzzlement would be less likely to arise, I think.

How many children do you have? Two.
Do you have at least one boy? Yes.
Do you have a son born on a Tuesday? Yes.

I suspect that this series would be less likely to confuse people, and more likely to let them see why the Tuesday question makes a difference, because it makes it less likely for people to identify the son born on Tuesday with the first boy the father has by saying "a son".

Even if it wouldn't confuse people less, it makes it much easier to explain why the Tuesday question matters. You can just say "If the father has two boys he's more likely to answer the third question yes."

If I am right, the significance of this exercise is not to show that folk logic is defective, but to show that it's possible to fool people proficient in everyday thinking by devising highly unlikely situations never seen in real life and then carefully presenting them in a way encouraging them to treat it in everyday ways. (Alternatively, the significance is in phrasing the question in the way most likely to have people assume that it's the first "at least one" son who is or isn't born on Tuesday, rather than either one of them.)

The questions people expect and are prepared for in this kind of situation, because they're the best way to get the information people actually want, are:

Do you have any other kids?
Girls or boys?

The Tuesday question just never comes up. I can imagine a case when a prize would be given randomly to a Tuesday child, but you'd just tell them about it. You might ask the question if there were offered a finder's fee for a Tuesday child, but the question would weird most parents out.

Another way to say it is that without special training people don't do well with lines of questioning which don't make sense, or can't be motivated, in terms of everyday situations and motives. This is in fact a real problem in certain real world instances, e.g. quantum theory, but dressing the questions up as much as possible to seem like everyday questioning, when it isn't, does fool people.
John Emerson on May 30, 2010 6:35 PM at 6:35 pm said:

At my link at the bottom E. W. Dijkstra argues that translating formal ideas into prose is often confusing and frequently creates misunderstanding, and in fact that translating back into prose returns to exactly the same problems formalization is intended to escape. His example is material implication:

"Consequently, I have come to the conclusion that it is a mistake to teach logic by translating formulae into prose, for that is precisely the vehicle from which we want to increase our distance………. Let me give two examples of how confusing our languages are. By the Law of the Counterpositive, A –> B is the same as ~B –> ~A. The implication is linguistically rendered by prefixing the antecedent with "if", e.g.["It will rain tomorrow if the wind does not turn."] a perfectly acceptable sentence. The counterpositive, however, yields:["The wind turns if it won't rain tomorrow."]a funny statement, to say the least. Evidently, the conjunctive "if" carries with it a whole extra-logical burden of before/after or cause/effect (a dichotomy for which there is no place in the inanimate world)."
Jason Dana on May 30, 2010 7:22 PM at 7:22 pm said:

Yes, 13/27, but if we know the boy born on Tuesday's name, the probability of 2 boys is 1/2. booyah
anerjee on May 30, 2010 11:59 PM at 11:59 pm said:

What if I add another piece of information — that the boy born on a tuesday was actually born exactly at 11pm. What is the probability that the other child is a boy?

It seems to me that the more specific we become about the birth time of the first kid, the closer the answer slides to 0.5.

I do not have an intuition about why that is so.
jdm on June 1, 2010 12:52 PM at 12:52 pm said:

"What if I add another piece of information — that the boy born on a tuesday was actually born exactly at 11pm."

If you have some additional condition X (e.g. "was born on Tuesday") with probability p (e.g. 1/7), the general result is

P(child 1 is a boy and X OR child 2 is a boy and X)

= 1/4 * (2p – p^2)/(p – p^2/4)

(see previous commenter "humidity"'s remarks).

if p = 0, so that the two children are uniquely identifiable, then P = 1/2 (e.g. suppose you have "one is a boy and his name is Ethan", or "one is a boy and he was born between 11:59:59 and midnight").

in the other extreme, if p = 1, there is nothing to distinguish the children, and you get P = 1/3 as you should.

for 0
jdm on June 1, 2010 1:29 PM at 1:29 pm said:

P(child 1 is a boy and X OR child 2 is a boy and X)

above should read

P(child 1 is a boy AND child 2 is a boy |
child 1 is a boy and X OR child 2 is a boy and X
)

for 0

should read

for 0 P > 1/3
Drew Frank on June 3, 2010 11:07 AM at 11:07 am said:

While the answer — 13/27 — is straightforward to compute, I found the task of refining my intuition to match reality a more difficult and interesting challenge. Here is the best "explanation" I've come up with so far:

First, consider the simple problem when no "day of birth" information is given. There are four possible gender combinations for the two children: BB, BG, GB, GG. Suppose we learn fact X: "at least one child is a boy." This rules out the GG state, and the remaining states are all equally likely. Thus, p(BB|X) = 1/3.

Now suppose we learn fact Y: "at least one child is a boy born on a Tuesday." Again, GG is ruled out: p(GG|Y) = 0. In this case, however, the other options are no longer equally likely. This can be made rigorous using Bayes rule, but the intuition is that the BB state has "two chances" of producing a Tuesday-boy, where as BG and GB have only one chance each of producing a Tuesday-boy.

Finally — and this has probably been mentioned in another comment — but it's worth noting that if the information given is Z: "at least one child is a boy and at least one child was born on a Tuesday," then p(BB|Z) = 1/3.
ML on June 3, 2010 2:47 PM at 2:47 pm said:

I at first also arrived at 1/3 for BB|B and 13/27 for BB| B on Tuesday, but after thinking about this problem more carefully, I think these answers are not accurate given the problem.

If the person deliberately or randomly picks one of his kids to talk about, either younger or older, and says in the same manner, the probability of 2 kids with the same gender would have been .5. For example, if he has a older Boy born on Tuesday and an younger girl born on a Thursday, and he, deliberately or randomly, picks the older to talk about, then ask the same question posed in the original problem, the answer would be .5.

The fact that he asked you the question meant that he already picked a specific kid to talk about, deliberately or randomly. Therefore the answer is the unlikely 1/2.

To see this more clearly, if all parents in the world who have two kids would come to you and ask this question: "I have a [gender of one of the kids, randomly or deliberately picked]. What's the chance that I have two [gender mentioned earlier]?" Then half of those parents will ask the question about boys, and half will be about girls, and within each half, half will have two of the same gender.

The answer would be 1/3 for BB|B or 13/27 for BB|B on Tuesday if the following happens: you randomly pick a gender, say boy, and ask that person, "do you have a boy [born on Tuesday]." Then the phrasing of the original puzzle follows, then you'll have 1/3 or 13/27.

This is different from Monty Hall in the sense that Monty Hall can and will always open an empty door. His knowledge changes the probability. In this case, this person reports a real gender of one of his kids, WHATEVER that gender is.
Kevin Knuth on June 3, 2010 5:30 PM at 5:30 pm said:

Michael Tobin above concisely laid out Bayes theorem for this problem. On my blog, I have the solution using Bayes Theorem presented in more detail for those who are interested.

http://kevinknuth.com/blog/2010/06/bayes-theorem-…

One way to think about this is to consider the two extremes. In one extreme, the person just says that he has two children. What is the probability that both are boys? Using a prior where boys and girls are equally probable, we get 1/4.

If he says one child is a boy, then that information allows you to infer that both are boys with probability 1/3.

In the other extreme, the person shows you his son and says he has another child. The probability that the other child is a boy is 1/2.

As the person adds information about one of the children who happens to be a boy, the probability increases from 1/3 to 1/2.
Russ Redford on June 4, 2010 4:04 AM at 4:04 am said:

It depends on what Fosse meant. If he meant, "I have two children. One is a boy. He was born on Tuesday", then the Tuesday is irrelevant information (I'm assuming that he would state the day whatever it was). However, if he is suggesting that if he did NOT have a boy born on Tuesday his question would have been, "I have two children neither of whom is a boy born on Tuesday", or perhaps simply, "I have two children". That would certainly make a difference. (Imagine Fosse adding to the statement "… and if one was NOT a son born on a Tuesday I would not have said that one was a son". If we assume, as I did, that no matter what day the boy was born Fosse would still have told us that he had a son (born on ?day) then the Tuesday is simply irrelevant information and the probability of two boys is 1/3.
Poul Bundgaard on June 10, 2010 1:27 PM at 1:27 pm said:

Forshees calculation is fantastic because it has convinced a whole lot of people that their intutions are completely fooled. But the problem is – the calculations is wrong, but the error is well hidden.
This will explain where the flaw is, and the keys is: Not all outcomes are nessesarily equally weigted:

After the “two children, at least one boy” information we have 3 combinations of the children: BG, GB and BB, as we all probably know. Then let’s merge the weekday information into each specific child combination:
1 – Boy-Girl:
Foshee have told us, that one of the children is a boy, and he’s born on a Tuesday.
In this case it’s 100% certain that his is speaking of the first child, even if the audiences don’t know it (This is actually an important note, as you will see later).
The boy with the specified weekday is called BX (X because it’s irrelevant what weekday the boy is born). So we get these 7 possible combinations: BX-GMon, BX-GTue, BX-GWed, BX-GThu, BX-GFri, BX-GSat and BX-GSun.

2 – Girl-Boy:
Same story: 7 possible combinations: GMon-BX, GTue-BX….

3 – Boy Boy
Now it’s getting interesting. Foshee now have to choose which boy he will reveal the weekday of birth.
BUT – he can only choose one of them! Let’s say Foshee has no preference, so each boy have ONLY 50% CHANCE to be picked.
Each boy delivers 7 combinations (BX-BMon…. and Bmon-Bx…) which gives 14 combinations, but because the chance of each boy to be chosen is only 50%, then the weight of each combination have to be DIVIDED BY 2!

And so the final odds of a 2-boy situation will add up to 1/3 – just as your intuition tells you.
Hamilton on June 17, 2010 10:47 AM at 10:47 am said:

The confusion occurs due to 3 basic errors:

If we ask the question:
“I have two children, one is a boy called Malcolm and he was born on a Tuesday”
This will not change the problem, but does make it a lot easier to explain.

Error 1
Why does it matter if Malcolm is older or younger than his sister, but not matter when he is older or younger than his brother? (Inconsistent)

Error 2
The first section was treated as though it made no difference if Malcolm was older or younger than his brother, but in the second section we suddenly find it does matter after all. (Inconsistent)

Error 3
In the case where both siblings are born on a Tuesday: It is acknowledged that Malcolm being older than his SISTER is different from him being younger. However, it is then assumed that Malcolm being older than his BROTHER is somehow identical to him being younger, leading to the elimination and only 6 outcomes in the final case. (Inconsistent)

Of course, it’s just 50% all the way through.
JeffJo on June 19, 2010 3:41 AM at 3:41 am said:

Problem 1: Mr. Smith tells you has two children, and at least one is a boy. What is the probability he has two boys?

The mistake almost everybody makes, is that this question is not about Mr. Smith. Probability is a property of a random process, not of a single instance of one – like Mr. Smith. This question is about the random process that brought Mr. Smith’s family to our attention. To solve it, we need to know what that process is for everyone, not just Mr. Smith. As stated, it is ambiguous, because it does not tell us what that process is. The problem’s solution is controversial, because different people make different assumptions to resolve the ambiguity, usually without even realizing it.

Here’s the indisputable – because it is also ambiguous – solution to problem #1: Let B be the probability that a randomly selected father of a boy and a girl would tell you about the boy instead of the girl. Call that event – that he tells you this fact, not that the fact itself is true – ALOB, for "at least one boy." Using the definition of conditional probability:

P(BB|ALOB) = P(BB and ALOB)/P(ALOB)
= [1*P(BB)] / [1*P(BB) + B*P(BG) + B*P(GB) + 0*P(GG)]
= (1/4)/(1/4 + B/4 + B/4)
= 1/(1+2*B)

So, if B=1, then P(BB|ALOB)=1/3. If B=1/2, then P(BB|ALOB)=1/2.

B=1 corresponds to a process where every father of two is required to tell us about boys before girls. This requirement is not something you can deduce from what Mr. Smith told you, it is an arbitrary assumption. I personally feel that B=1/2 is the best value to use, for the exact same reason that we say the probability a child is a boy is 1/2. When there are N symmetric outcomes for some part of the random process in a puzzle, we must assign the probability 1/N to each. Since this puzzle gave no reason why a man with two daughters would not say "at least one is a girl," we have to consider that response as a possibility for the PROCESS, but maybe not for this particular Mr. Smith. Since it is symmetric with "one is a boy" for a BG family, in my opinion we have to assume B=1/2, not B=1.

Incidentally, the same logic that leads to 1/3 here, is what leads to 1/2 in the Monty Hall problem. Which the same people will say is wrong.

Problem #2: Mr. Smith tells you has two children, and at least one is a Tuesday boy. What is the probability he has two boys?

The solution (disputable only in what choices to model with variables, and the significant conclusions will not change if you use others) to problem #2: Keep the same B, and let T be the probability that a randomly selected father of a Tuesday boy a non-Tuesday Boy would tell you about the Tuesday boy. Of the 196 day+gender combinations, there is 1 with two Tuesday boys, 12 with one Tuesday boy and a non-Tuesday boy, 14 with one Tuesday boy and a girl, and 169 without a Tuesday boy. Thus:

P(BB|ALOTB) = P(BB and ALOTB)/P(ALOTB)
= [1*1/196 + T*12/196)] / [1*1/196 + T*12/196 + B*14/196 + 0*169/196]
= (1 + 12T)/(1 + 12T + 14B)

So, if B=T=1, then the correct probability is 13/27. If B=T=1/2, the correct probability is again 1/2. Even those people who believe that B should be 1 do not find 13/27 intuitive. It's because they universally expect T to be 1/2 regardless of what they feel B is. They refuse to assume that Mr. Smith was required to tell us about a Tuesday boy over a Thursday Boy, or any girl, while making that assumption about "boy" vs. "girl." So, if you think still the answer is 1/3 to both problems, I suggest you examine the reasons you are making the arbitrary assumption that B=1 while you want T=1/2. Why not B=1/2 and T=1, so the answer is 13/20?
Lloyd Knox on June 23, 2010 2:39 PM at 2:39 pm said:

you are exactly right — that is where the error occurs.

my analysis:
if the statement of the problem has no information about the other child, the probability about the gender of the other child cannot change due to that information. only if the statement from the father can be interpreted as implying that the other child is NOT a boy born on a Tuesday, does the probability of the other child being a boy become 13/27.
Lloyd Knox on June 23, 2010 7:51 PM at 7:51 pm said:

Dropping the Tuesday thing makes this simpler, so let's try it. Some people argue that there are four possibilities, BB, BG, GB and GG and the information given eliminates only one of these possibilities, so P(BB|B) = 1/3. [The same reasoning leads to the 13/27 in the case we throw in the bit about Tuesday.] I disagree with the argument. If we are counting BG and GB as distinct outcomes, then one of them is eliminated as well, since one of them corresponds to the child we know to be a boy, actually being a girl.

For those of you who have found tables helpful, consider the table below. It has four entries. The information we are given knocks out the second column. It does not knock out any rows.

gender of child we are told about
&nbsp &nbsp &nbsp &nbsp B &nbsp &nbsp &nbsp G
o
t
h &nbsp B
e
r

c
h
i &nbsp G
l
d

The widespread intuition is correct: if no information relevant to the other child is given, then the odds that other child is a boy do not change as a result of that information.

As a friend pointed out, some implicit assumptions may be wrong: that the man sired one son MAY be relevant information if that indicates something about the distribution of his Y and X sperm.
Lloyd Knox on June 23, 2010 8:41 PM at 8:41 pm said:

Ack! OK. I just realized that the information we're given is "one of them is a boy", not something like "child X is a boy". So, indeed, we eliminate GG and nothing else. Wow!
For me, the resolution of the intuition problem is that we are NOT being told something about an individual child (in which case the information would have no bearing on the other child) but we are instead being told something about the pair of children, so probabilities for both are affected.

cool problem that engaged me for quite a while!
JeffJo on June 24, 2010 4:53 AM at 4:53 am said:

Lloyd, you are correct that the controversy surrounding this question is rooted in how we interpret "one of them is a (Tuesday) Boy." Does that mean "The parent was selected from a set of families specifically pre-screened for (Tuesday) Boys by observing both children, and was instructed to only mention one (Tuesday) Boy" ? Or, does it mean "A random parent simply told us about one child he pick at random from his two" ? And the statement of the puzzle is ambiguous with respect to this option. Either could be meant, but intuition tells us that the statement should have been more explicit if it meant the first, since it is far more complex and requires a lot of assumptions.

For 1/3 and 13/27 to be correct, there must be somebody pre-screening the parents who are allowed to talk to us, so that only those who can say "I have a boy (born on Tuesday)" are allowed to. And further, they can't be allowed to say "I have a girl (or a boy born on Friday)" even if it is true. If it is just a random parent speaking, who is allowed to make any statement that is true, the answer to either question is 1/2. Now, I'm not saying the statement can't mean this option, just that it is unreasonable. And it is its unreasonableness that leads to the answer changing when you add "Tuesday." Essentially, by requiring a rare kind of boy, this screening process favors two-boy families since there is (about) twice the chance of finding the rare one in them.

Just to emphasize this difference, let me apply the same reasoning that produces 1/3 and 13/27 to another famous "paradox." You are on a game show, and are offered the choice of three doors with a prize behind each. One is a car, the other two are goats. You pick Door #1, but before revealing your prize the host opens Door #3 to reveal a goat (he knows where the car is, and deliberately reveals a goat). He offers to let you switch to Door #2. Should you?

In the general population of the times this game is played, there are three equally-likely places the car could be. Call the possibilities C1, C2, and C3. But what the host revealed to us for this instance of the game rules out C3, so C1 and C2 are still possibilities. Since C1 and C2 occur equally often in the general population of games, should we conclude that C1 and C2 are still equally likely in this instance, based on the host's information?

Before you answer, note how I described this to parallel the Two Child Problem: There, the general population of two-child families has four equally-likely possibilities, BB, BG, GB, and GG. What the parent told us rules out GG. So, BB, BG, and GB are still possibilities. Should we conclude that BB, BG, and GB are still equally likely in this family, based on the parent's information?

The accepted answer for the Game Show Problem is "no." The host had to open Door #3 if the car was behind Door #2, but he had a choice if it was behind Door #1. So, no instances of C2 can be removed from the sample space. But some instances of C1 can. The problem statement doesn’t actually say whether they are or not. But we must assume the host uses his ability to choose, and chooses randomly when he can. So half of the instances of C1 must be ruled out, but none of the C2 instances can. Thus, the probability the car is behind Door #2 is 2/3, the probabiltiy it is behind Door #1 is 1/3, and we should switch.

It can't be any different in the Two Child Problem. In the BG and GB cases, the parent has the exact same choice the host did in his C1 case. As a result, we must assume that half of these cases are ruled out for this family, in addition to all the GG cases. But none of the BB cases can be ruled out this way. So the answer becomes P(BB)/(P(BB) + P(BG)/2 + P(GB)/2)=1/2. And notice that it doesn’t matter if the parent adds "born on Tuesday," so our intuition about the added information becoems correct.
Darin Johnson on July 6, 2010 8:40 AM at 8:40 am said:

The Drunkard's Walk, by Mlodinow, discusses a similar question in the "girl named Florida" problem. In this case, the probability of the prior is much lower than 1/7 (i.e., the odds of finding anyone named Florida are small), and the chance of two girls is 1/2.

In this example, the probability of two girls named Florida (equivalent to two boys born on Tuesday) is assumed to be zero. So it doesn't help with that question.
JeffJo on July 7, 2010 11:38 AM at 11:38 am said:

It's "The Drunkard's Walk: How Randomness Rules Our Lives." As I recall, Mlodinow just ignores the problem of double names (see http://blogs.wsj.com/numbersguy/probability-quiz-… and I believe he concludes the probability is very close to, but less than, 1/2. At least, he does in that blog. I assumed Floshee based his problem on Mlodinow's, and solved the name problem by changing it to Tuesday. I usually use "left handed" and say the probability a child has it is 1/5. That isn't right, but makes for easy mental calculation in percent.

An Italian bayesian named Giulio D'Agostini wrote a paper trying to handle the double name problem (see http://arxiv.org/abs/1001.0708). He concludes the probability is exactly 1/2. But he did it wrong by only disallowing two Floridas: two girls in a family can still be named Mary, or Kim, or Moonchild. If you do it right (a finite set of N names with any valid probability distribution and a convenient definition of the line between "common" and "uncommon"), the probability is actually slightly greater than 1/2 if the name is uncommon, less if it is common, and exactly 1/2 on the dividing line.
Hopefully Anonymous on July 7, 2010 4:26 PM at 4:26 pm said:

"The interesting question, I think, is how often do these sorts of tricky conditional probability problems arise in real life. I don't know the answer. (That is, I'm not trying to raise a rhetorical question and claim that these problems don't arise in real life. What I'm saying is that I don't know and would be interested in seeing how to think systematically about the question.)"

It would nice to see a list of the greatest real world successes resulting from this level of probability parlor trick.

My sense is that machine learning often builds off this type of statistical analysis, so it can work accompanied by some type of optimization process.

Am I wrong?
Sam Ritter on July 8, 2010 5:48 AM at 5:48 am said:

I didn't read through the whole comment section, but it should be noted that we must use the information as it is presented. Based on the setup, the probability that child A or child B is a boy begins at 1/2 for each. We know that there are 2 children and one is a boy. The day of the week is irrelevant. Once we understand this, we no longer need to examine a full data set for each child with all the probabilities. We only need to calculate 1/2 the data set. Rather than 14/28 lowered by the 1/14 probability for a Tuesday boy giving us 13/27, we have a 1/2 probability of a Tuesday boy, giving us a 7/14 probability for a second boy.
JeffJo on July 8, 2010 8:13 AM at 8:13 am said:

@HA: Oddly enough, that both a bad point, and a good point.

It's a bad point, because the underlying issue is valid in the real world (see below). Non-statisticians have a very difficult time applying intuition to conditional probability. That's a main theme in Mlodinow's book, which I highly recommend even though I think he blew this particular example.

It's a good point, because the form of the condition Foshee and Mlodinow assume is very unusual and misleading, and neither Foshee or Mlodinow explain that. They assume something like a telephone poll, where the first question is "Do you have exactly two children, including at least one boy born on a Tuesday?" If the answer is no, the poll-taker hangs up and tries a new phone number. So the conditional probability is artificially skewed, rather than being realistically skewed.

Here's an example of unexpected results from conditional probability, that applies to real life: Say that a medical test has a 1% false-positive rate, and a 0% false negative rate. You take the test, and it is positive. What are the chances you have the disease?

Wrong answer: Anything near 99%.

Right idea, still wrong answer: It depends on how many people have the disease. Say only 1 in 10,000 do. If 100,000 people take the test, 10 will have the disease and test positive. But 99,990 will not have it, and about 100 of them will get a false positive result. So the probability that any one positive test means the testee has the disease is 10/110, or about 9%.

Right answer: You don’t test the entire population for a disease that affects only 1 in 10,000. You usually have to suspect a patient might have it before you go to the expense of testing. So the important figure in the calculation is not the fraction of all people who have it, but the fraction of people who take the test and have it. If 1 in 100 *tested* people have it, then 1000 of the 100,000 tested will get true positives. 99,000 will not and 990 of them will get false positives. The probability a positive is correct is 1000/1990, or about 50%.

Surprised? The "parlor trick" helps to understand how conditional probability so radically affects these answers. The need to pay attention to how the test is applied, rather than just running the numbers on every possibility, is the difference between Foshee's answers of 1/3 and 13/27 (he ran the numbers on everybody), and mine of 1/2 and 1/2.
Derek O'Connor on July 21, 2010 5:19 AM at 5:19 am said:

Andrew,

You say: “The interesting question, I think, is how often do these sorts of tricky conditional probability problems arise in real life.''

Here is an interesting and important legal example:

http://www.weeklystandard.com/blogs/disconnecting…

which quotes a judge of the D.C. Circuit Court of Appeals rebuking another judge:

— The circuit court's chief criticism of Judge Kessler's ruling involved her “failure to appreciate conditional probability analysis.'' The circuit court explains “that although some events are independent (coin flips, for example), other events are dependent.'' That is, if one event occurs, then this makes other events “more or less likely.'' —

Derek O'Connor
Hagen Hannemann on July 29, 2010 11:31 AM at 11:31 am said:

I think the reason the answer 13/27 seems so counterintuitive is that it's wrong.

Or, to be fair, it's the correct answer to the wrong question.

"Normal people" (i.e. without more than basic math education) do actually mean pretty much the same thing with "probability" as we do: "How often does the event happen, given lots of tries?"

The intuitive as well as the precise definition relies on counting cases – this is what you (Andrew) do, obviously. More precisely (true for the intuitive version as well, even when not conscious and explictly stated): Counting events, what fraction falls into the category specified? (*)

But when the question is "what's the probability I have two boys", these criteria are clearly ONLY refering to the gender. Which means the question is asking "if we count the possible hypothetical boy/girl combinations, what fraction has two boys". It's not asking "if we count the possible hypothetical weekday/gender combinations, what fraction of them then has two boys".

What possibly adds to the confusion here is the intuitive(?) thought that when we ignore the added category "weekday" in our question, then counting with or without that category in place should give the same results. This is obviously not the case, though (and the difference originates in the fraction of "collisions" that occur, similarly to the basic "two kids, at least one boy" or the "two dice, 2×6 vs 5 and 6" questions, gets reduced by introducing more total slots).

The fact that you're taking the additional information into account to generate additional categories (by adding the weekdays) generates an answer to a different question because the categories you are counting in are different.

This becomes painfully obvious when you change the "puzzle" like this: John states "I have two children, whom I both randomly (with equal distribution) assigned into one out of n total completely meaningless categories. One of my children is a boy who got assigned into category 1. What's the probability I have two boys?" – you obviously could do the same counting here and get to the same (2n-1)/(4n-1) that Nary posted above. But nobody speaking normal language would want their question to mean "count in that number of categories" – everyone takes that question to mean exactly the same as if the statement about those meaningless categories hadn't been mentioned at all.

(*) If you don't believe that that's the same thing people intuitively mean, consider someone without any school education beyond the most basic looking at the sky and saying he thinks it's highly likely to rain that day. What he means by that – even if he wouldn't exactly phrase it like that – is that if he looks back at all the times he's seen similar clouds, a big fraction out of those had rain following shortly after. He categorizes the days with such weather into "rain followed" and "rain didn't follow".
Yannick Berker on August 2, 2010 7:42 AM at 7:42 am said:

For all of you who think that 13/27 is not an intuitive solution, and perhaps for some of those who still think 1/3 must be the correct answer simply because the birth day does not matter, I will try to give a very brief explanation why it indeed IS intuitive.

Conditional probability used here tells us that we only consider families with a boy born on Tuesday. This is a kind of pre-screening process as it is called in some of the above comments. Now why does this pre-screening change the very inner structure of the set of families that we deal with? In both problems, with and without the Tuesday condition, we deal with families of the following composition: BB, BG, GB. Now, without the Tuesday condition, it is easy to derive the solution of 1/3. Why does this change when applying the Tuesday condition?

Because the Tuesday pre-screening process favors families with two boys. They have almost double the chance of being included into this virtual study, simply because having a girl born on Tuesday does not help, but having a boy born on Tuesday does indeed. So if the chance of being born on a Tuesday is 1/7, the chances of a BG (or GB) family of being included is 1/7, too. However, the chances of a BB family are 1/7 + 1/7 – 1/7^2 (first boy + second boy – both boys). This gives a lot more weight to BB families, raising the odds for a BB family in the study with Tuesday condition.
Andrew Gelman on August 2, 2010 8:20 AM at 8:20 am said:

I'm envious of those bloggers who regularly get 50-100 comments on every entry. Maybe I could reach this level if I post more on probability paradoxes?
Craig Goodrich on August 6, 2010 11:23 AM at 11:23 am said:

Yannick —

I'm not sure why you subtracted 7^-2 from the BB family probability; clearly having two boys both born on Tuesday does not disqualify a family from the sample, if we interpret the information as "at least one" rather than "exactly one".

And if we don't subtract 7^-2, we get

BG 1/7
GB 1/7
BB 2/7

— that is, 50% probability, which corresponds to our intuition.
JeffJo on August 11, 2010 1:02 PM at 1:02 pm said:

Yannick: You made just one mistake. It was when you said "Conditional probability used here tells us that we only consider families with a boy born on Tuesday." Because the condition is not merely that the family HAS a boy born on a Tuesday, but that the father TELLS US that fact when he only wants to tell us about one child. The event is not represented by just the existence of the children, but by what the father does when he has a choice. That is also a random variable here.

To screen for the correct condition, you need to assign a probability Q to the event where a father has one boy born on Tuesday, has another child who is not a boy born on Tuesday, and tells us about the first child. Of the 196 "kinds" of fathers, there are 26 who have to make this choice, and 12 of those have two boys. The 27th father doesn't have to make that choice, and also has two boys. That makes the correct answer, regardless of what you think Q is, (12Q+1)/(26Q+1).

The reason the 13/27 answer is not an intuitive solution has nothing to do with anything you mentioned. It is because Q=1 is an unintuitive value for this probability. Unless we are told WHY a father would always mention a boy born on Tuesday instead of a boy born of Thursday, or a girl born on Friday, the intuitive value for Q is 1/2.

But that means that the "standard" answer of 1/3 for the simpler problem where the father just says "One is a boy" is also wrong. It should also be 1/2.
Luiz Pires on August 30, 2010 7:26 AM at 7:26 am said:

Let me add a twist to this…

What if the father then says – 'Wait, I misspoke – I meant to say Thursday, not Tuesday!' You would likely say that 'No problem, the probability for Thursday remains 13/27'.

Then he says, 'Wait, wait – I have to confess that I am terrible with details and don't actually remember the day of the week anymore. Fortunately, my wife has written the day of the week in a note inside this closed envelope.'

Might you not reasonably say: 'You know, regardless of the day of the week she has written in this note, the probability will remain 13/27'?

What if you open the envelope and it is written in a language you don't understand? What if the envelope is empty? Does the probability go back to 1/3?
Daniela Poehler on November 22, 2010 11:54 AM at 11:54 am said:

maybe with this trick you come (like me) to the conclusion that the tuesday-boy-example does not contradict, but CONFIRM our intuition (sorry for my English):
take a boy born in august. For a 2nd boy you get (12+11)/(24+23)=23/47 which is even nearer to 1/2 than 13/27 and 1/3. Obviously: the more you know (details with a lower probability) about the boy, the nearer you come to 1/2.
take the infinitesimal case: you know EVERYTHING about the boy. The probability (mathematically/ statistically calculatet) will be 1/2.
now take what your intuition takes, take a living and really existing boy standing in front of you, maybe a boy born an tuesday in august with yellow shoes who likes football (and so on). You know very much more about him than just that he's e tuesday-boy. And both, mathematics and your intuition say: probability for 2nd boy: 1/2!
JeffJo on December 17, 2010 9:52 AM at 9:52 am said:

It's an interesting theory, Daniela, but easily seen to be incorrect. If all we know about the boy is that he is the older of the two children, by your reasoning the answer should be somewhere in between 1/3 and 1/2; yet we know it is exactly 1/2 in that case.

The resolution of the paradox is simple: If the information is an answer to the direct question "Do you have a boy born on a Tuesday?", the probability is 13/27. If the information was chosen simply because it matched a child, the answer is 1/2. Knowing the information is not always enough to solve probability problems. You also need to know how the information was obtained.

The reason the issue causes so much controversy is because you have been misled by your teachers and many other so-called experts into believing the original problem, without "born on Tuesday," has a definitive answer. It doesn't. If they would just acknowledge, and address, the fact that it is not a well-formed question, there would be no such controversy. Many have done this, from Martin Gardner to Keith Devlin; but many more stubbornly refuse to.
Manoel Galdino on June 6, 2011 8:43 AM at 8:43 am said:

I kwno it's an old entry, but I just have an idea. As someone alluded previously, the "problem" is related to the ordering and if the question is about children or about father with children.

Assume I have one child and I say to you: It's a boy and it was born on tuesday. Then you meet one year later me and ask me about my child, and I say now I have two children. But I'm in a hurry and go away without saying anything else. It happens that you come to visit me and decide to take give gifts to the kids. You were told one is a boy born on tuesday and you were wondering what is the probability that I have two boys in order to decide if you buy two boys' gifts or one for boy and one for girls.

It happens that it's 1/2, not 13/27. So, the reason that we think is 1/2 is because we know that the probability of being a boy is 1/2 and past events like the sex of a previous child cannot affect the probability of the new boy being a boy, so we conclude that, given that one is already a boy, and this event (not the information, but the event) can't affect the probability of the new boy, the probability is half.

In the same way, the data of birth cannot affect the probability of having a boy, so we think the events as being independent.

However, we're told about two boys in a row, then the information about the day of birth affects the information we have about the sample space.

Comments are closed.