## I don’t like this cartoon

Some people pointed me to this:

I am happy to see statistical theory and methods be a topic in popular culture, and of course I’m glad that, contra Feller, the Bayesian is presented as the hero this time, but . . . . I think the lower-left panel of the cartoon unfairly misrepresents frequentist statisticians.

Frequentist statisticians recognize many statistical goals. Point estimates trade off bias and variance. Interval estimates have the goal of achieving nominal coverage and the goal of being informative. Tests have the goals of calibration and power. Frequentists know that no single principle applies in all settings, and this is a setting where this particular method is clearly inappropriate. All statisticians use prior information in their statistical analysis. Non-Bayesians express their prior information not through a probability distribution on parameters but rather through their choice of methods. I think this non-Bayesian attitude is too restrictive, but in this case a small amount of reflection would reveal the inappropriateness of this procedure for this example.

In this comment, Phil defends the cartoon, pointing out that the procedure it describes is equivalent to the classical hypothesis-testing approach that is indeed widely used. Phil (and, by extension, the cartoonist) have a point, but I don’t think a sensible statistician would use this method to estimate such a rare probability. An analogy from a Bayesian perspective would be to use the probability estimate (y+1)/(n+2) with y=0 and n=36 for an extremely unlikely event, for example estimating the rate of BSE infection in a population as 1/38 based on the data that 0 people out of a random sample of 36 are infected. The flat prior is inappropriate in a context where the probability is very low; similarly the test with 1/36 chance of error is inappropriate in a classical setting where the true positive rate is extremely low.

The error represented in the lower-left panel of the cartoon is not quite not a problem with the classical theory of statistics—frequentist statisticians have many principles and hold that no statistical principle is all-encompassing (see here, also the ensuing discussion), but perhaps it is a problem with textbooks on classical statistics, that they typically consider the conditional statistical properties of a test (type 1 and type 2 error rates) without discussing the range of applicability of the method. In the context of probability mathematics, textbooks carefully explain that p(A|B) != p(B|A), and how a test with a low error rate can have a high rate of errors conditional on a positive finding, if the underlying rate of positives is low, but the textbooks typically confine this problem to the probability chapters and don’t explain its relevance to accept/reject decisions in statistical hypothesis testing. Still, I think the cartoon as a whole is unfair in that it compares a sensible Bayesian to a frequentist statistician who blindly follows the advice of shallow textbooks.

As an aside, I also think the lower-right panel is misleading. A betting decision depends not just on probabilities but also on utilities. If the sun as gone nova, money is worthless. Hence anyone, Bayesian or not, should be willing to bet \$50 that the sun has not exploded.

1. Jeremy Fox says:

Reassuring that I’m not the only one who found the joke to be grounded in a misunderstanding, and that I didn’t find the joke unfunny just because frequentists like me were the butt of it!

2. Paul says:

I suppose I’m being too realistic, but I don’t recognize equivalence between xkcd example — two dice — and Phil’s medical case in which the same test is administered twice. For any medical test, we have good reason to believe that two trials on the same person will not be statistically independent. For example, the person could be one of, say, 10% of the population, to have a condition that is guaranteed to give a false positive every time.

… As for the xkcd example, I’ve tried to understand Mayo but I haven’t been able to understand her argument. Shouldn’t it simply be handled this way: “You have just tested and rejected the null hypothesis that ‘the sun has not gone nova and the two dice have not come up sixes’. Your conclusion is thus that ‘the sun has gone nova or two dice have come up sixes’. But you cannot conclude that ‘the sun has gone nova’ from this result.” (Or is that her position, too?)

• Phil says:

As I mentioned in my comment, the specific medical text I’m referring to has independent errors. However, feel free to picture a test with a 1/12 false positive rate but with correlated errors in sequential tests such that the probability of successive false positives is 1/36 instead of 1/144. Ok?

3. Dan says:

There was also a roll over with the original comic that says:

“Detector! What would the Bayesian statistician say if I asked him whether the–[roll] I AM A NEUTRINO DETECTOR, NOT A LABYRINTH GUARD. SERIOUSLY, DID YOUR BRAIN FALL OUT? [roll]… yes.”

4. Ely Spears says:

I see professional statisticians doing this often. So I am happy to see something that mocks such efforts. It’s not a knock on all Frequentists, just that big chunk who use the methods like they are push-button procedures, without consideration for the consequences of their inference choices. The same could be done to mock Bayesians, but it would lack the extra punch that the Frequentist mocking has, since that is far more efficacious for shaming lazy practitioners. Yes, this is a caricature. A funny caricature.

Also, there was a nice post on R-Bloggers today about what exactly the Bayesian’s prior had to be under some mild assumptions. So the cartoon, like many at XKCD, is functioning on multiple levels.

• Andrew says:

I disagree with Larry’s cartoon. I see no evidence that Bayesian 95% intervals are less likely to cover the true value, compared to frequentist 95% intervals. (In fairness to Larry, he is just doing a parody, so I assume he is not meaning his cartoon to be taken seriously.)

• Larry Wasserman says:

Yes. I was trying to mock his ridiculous version of frequentist statistics
with a similarly ridiculous cartoon.

Regarding coverage, there are indeed cases where Bayesian intervals
don’t have frequency coverage: but that’s not a criticism of Bayes since
they (Bayes intervals) are not designed to have frequentist coverage.

Larry

• Andrew says:

Larry:

Bayes intervals are intended to have frequentist coverage, averaging over the prior! That’s not the same as conditional frequentist coverage, but it ain’t nothing.

• Larry Wasserman says:

Yes that’s true.
I meant coverage in the usual frequentist sense: i.e.
inf_theta P(theta in C) >= 1-alpha

Perhaps we should call this uniform coverage?

But I agree that if you average over the prior what you say is true.

Perhaps we should call this Bayesian coverage?

Larry

• Andrew says:

Larry:

I would call both of them coverage, but conditional on different things.

• K? O'Rourke says:

I do recall Jim Berger arguing somewhere that average coverage really was not importantly less (in actual applications) but this paper perhaps makes the tradeoffs very clear and obvious –

Interval Estimation for Messy Observational Data Paul Gustafson and Sander Greenland

(Larry wasn’t Paul one of your students?)

• Entsophy says:

I posted this on Larry’s blog but it’s worth repeating. It’s helpful to move this away from philosophy and make a technical claim. Suppose you use the usual model y=m+error for the mass of neutrino and you collect some data. When you compute the 95% CI with this data one of two things will happen: the interval will contain the true value or it won’t. Consider both cases separately,

95% CI CONTAINS THE TRUE VALUE: using the same data the Bayesian will get exactly the same result with a uniform prior on the whole real line. On the other hand, if the Bayesian uses a uniform prior over an interval [a,b] and the true mass lies in this interval, than the new 95% Bayesian interval will be even smaller than the 95% CI and still contain the true length!

Moreover, it’s not difficult to find such an [a,b] in real life. For example, you could have used [0, “mass of a hydrogen atom”] since it is probably known before any direct measurement that the mass of the neutrino lies in this interval.

95% CI DOESN’T CONTAIN THE TRUE VALUE: In this case the measurement errors are such that they’re misleading and pull the interval away from the true value. When the Bayesian repeats the calculation using a uniform prior on [a,b] they will find that some of the ‘truth’ in this prior will correct some of the ‘falsity’ in the error model and drag the interval back closer to the true value. In some cases this effect will be big enough to drag the 95% Bayesian back onto the true value.

In either case including true prior information gets us an interval that demonstratedly brings us closer to the truth about the mass of the neutrino. So if your goal in life is to be wrong 5% of the time, then 95% CIs are the way to go. If you simply want to know the mass of the neutrino than do the Bayesian calculation.

• Entsophy says:

Incidentally, I might add that for a 95% CI to have actual 95% coverage you’d need to know something like “the errors over a very long sequence of trials have a histogram that looks the assumed probability distribution for the errors”

This assumption fails in real physical experiments quite a bit which is why the 95% CI’s don’t actually have 95% coverage very often.

On the hand every point I make in the above comment still holds true regardless of the long range histogram of errors looks like.

5. Memming says:

Here’s another related response to the comic with another comic. Could you comment on it as well?
http://normaldeviate.wordpress.com/2012/11/09/anti-xkcd/

6. […] A post from Andrew: I don’t like this cartoon […]

7. Doc says:

I agree the cartoon mocked the frequent misapplication of statistical theory by hacks, so I feel like the rest of the commentary missed the mark. In addition, as long as we are being pedantic, the sun is unlikely to ever go nova as it is too small.

• Andrew says:

Doc:

Sure—but then the comparison should be hack vs. non-hack, not frequentist vs. Bayesian.

• Ely Spears says:

But there’s a much wider built-in segment of Frequentist hacks than nearly any other category of any others major discipline in science and mathematics. That kind of hack pervades biostatistics, econometrics, psychology, sociology, and political science. It leads to file drawer bias and publication bias.

There’s enough of a hack archetype borne out by practitioners specifically identifying as Frequentists to make the distinction of Frequentist-reasoning-thoughtlessly-from-p-values a thing to mock all on its own.

I agree it is a caricature, but it’s a useful and funny one.

• Dan says:

+1

• stringph says:

But particle physicists aren’t ‘frequentist hacks’ — they do blind analyses, they publish negative results, and their frequentist claims of having discovered new particles have a pretty good record of winning Nobels — and the cartoon seems to be set in a particle physics context.

It’s like Randall carefully took aim at the host of villains misusing significance tests and by some miracle of inaccuracy managed to hit the one good guy and no-one else.

8. Neil says:

Yeah, and while we’re at it, the characters in the cartoon aren’t 100% anatomically correct either …

• Andrew says:

Neil:

My problem is ultimately not with the cartoon. My problem is that there are practitioners and teachers of statistics who spread cartoonish ideas about statistical methods without recognizing these ideas are inaccurate. See our discussion here and here, for example. I worked for six years in a department where many of the other faculty dismissed my methods without even thinking about them, and who told students not to take my classes. They acted like Bayesian ideas were some sort of malignant virus.

This new cartoon was getting a lot of attention so I thought this would be a good opportunity to clear away one bit of confusion regarding statistical methods. As the philosopher Deborah Mayo has discussed, there seem to be a lot of people who believe these cartoonish caricatures of frequentist statisticians. As a statistician who generally does not use such methods, I think I am well situated to offer this criticism.

To put it another way, Polish jokes can be funny or unfunny. But when the spread of such jokes reinforces people’s mistaken attitudes about Polish people, it can make sense to be literal-minded and point out the fallacies behind the jokes.

• Neil says:

I think anyone who understands it already has quite a sophisticated understanding of statistics. Sophisticated enough to differentiate between caricature and commentary. I agree that cartoonish representations have no place in the classroom, but then xkcd is a comic, not a class text. I just think we should be grateful that there is a humourist who can create caricatures for us that are close enough to reality that they can generate debate on one of the leading statistics blogs. For that reason I do like that cartoon (quite a lot actually).

• Andrew says:

Neil:

Humor is fine, but negative caricatures can do harm. I have experienced this myself. Had the cartoon been presented as “bad statistician” vs. “good statistician” or “rote statistician” vs. “sensible statistician,” I would’ve been fine with it.

• UW says:

“I worked for six years in a department where many of the other faculty dismissed my methods without even thinking about them, and who told students not to take my classes. They acted like Bayesian ideas were some sort of malignant virus.”

I can understand that making you sensitive to Bayesian caricatures, but what is it with statisticians? I studied in Seattle recently and there are lots of mathematical equivalences between Bayesian and frequentist approaches, right? So there are two camps of people, both concerned with the incredibly specific task of ‘inference’, but from two different philosophical backgrounds… and they’re assholes to each other. I’ve met a lot of great statsy professors, wonderful people, but I’m pretty sure there’s an unreasonable amount of assholeness generated by frequentist/Bayesian hair splitting.

• Andrew says:

Uw:

Sure, those guys were assholes. But I don’t think statisticians in general are assholes. People just sometimes act like that in organizations. For example, there was a scandal reported in the newspaper a few months ago about abuses at some mental hospitals or care centers in New York State. Someone working there reported the problems, then he got attacked. The governor got into the act and defended the system, even though it was clearly a disaster (according to the news reports). Whistleblowers typically get slammed. It’s something about how people behave in these sorts of social structures. They get together and act like assholes. I’m sure you can find the same thing in other academic field: biology, literature, whatever. Assholes will even turn up in a mild-mannered field such as real-estate sales, or so I gather from reading the collected works of David Mamet.

9. Almost everywhere Bayesians smack down frequentists, they are smacking down *bad* frequentists. Similarly, almost everywhere Frequentists smack down Bayesians, they are smacking down *bad* Bayesians. Larry Wasserman, for example, is multiply guilty of the latter. Probably time to stop participating in it all!

• Larry Wasserman says:

David. I don’t “smack down” Bayesians.
Bayesian inference is great for doing Bayesian inference.
Frequentist inference is great for doing Frequentist inference.
Any criticisms I have published are about people confusing the fact
that these two types of inference have different goals.

There is no reason why Bayes methods should have good frequency properties and
likewise there is no reason for frequentist methods to correctly represent degrees of beliefs.

Best wishes
Larry

I developed a love of statistics while studying psychology, and I have to admit the content often outstrips my knowledge so this is the first time I’ve commented.

I think that while the cartoon may be unfair to frequentists in the field of statistics I don’t think it’s unfair to many people who use frequentist statistics in other disciplines. It’s a fairly accurate assessment of what we were taught in psychology. If I queried a result I thought was illogical despite the significant result I was told “You’re thinking about it too much. Just trust the maths.”

11. Luke says:

I’ve just arrived to the field of statistics, so forgive my ignorance. I gravitate towards the Bayesian method because it seems to me that frequentists believe they are not subjecting their data to prior biases in the methods of obtaining that data (for example, inference on measured temperature; why not temperature squared, or log temperature?), while the Bayesian’s include an attempt to offset those biases with hypothetical prior distributions. Am I getting this right, or am I way off?

• Gustaf says:

Way off.

In practice, the difference between Bayesian and Frequentist lie not in the correctness of their results, but which statistical “language” you feel most comfortable to phrase the question in; this depends on your own knowledge and schooling, the ease of translating the situation into priors and/or methods, what kind of answer you’d find most useful, and what kind of model is easiest to implement.

Feel free to gravitate towards one or the other, depending on your environment and taste; but don’t do it because you think one school are ignorant (say, by ignoring obvious biases).

12. […] might be considered biased, so I’m outsourcing the details of this complaint to a well-known Bayesian […]

the sun has gone ANOVA

14. C Mueller says:

I think that (perhaps) some readers of the recent XKCD cartoon are missing the point (or maybe the comic just works on so many levels that there are MANY points). In any case, here’s my take:

First some history. Here’s an XKCD cartoon that has a place on my office door ( http://xkcd.com/892/ ). The latest cartoon is ALSO on my office door. I teach a basic applied statistics course at the university level, and I interpret BOTH of those XKCD cartoons as a reminder that a little knowledge is a dangerous thing. It is important to me that my students don’t leave my class with only “a little knowledge”. For example, I want them to know the procedures AND their limitations. Isn’t it possible that Friday’s XKCD wasn’t really poking fun at frequentist statisticians but at anyone who blindly uses procedures without really understanding the logic behind them and without understanding their limitations?

By the way, there’s another one that I have shared with my students: http://xkcd.com/882/ This one (in my opinion) sends exactly the same message that I have ascribed to the others. Perhaps it’s the context of having seen (and liked) #882 and #892 that leads me to see the current one (#1132) in the way I do.

• I think 882 makes a very real, very straightforward point, very well.

• zbicyclist says:

If you channge “frequentist” in the lower left panel to “researcher who took one stat course in grad school and needs to get some papers published before tenure review” it works for me. (other than being an impossibly clumsy caption)

And http://xkcd.com/882/ is such a great cartoon I’m willing to cut XKCD some slack.

15. Phil says:

Andrew, you are free not to like the cartoon and free to feel that it is unfair. I like the cartoon and I think it doesn’t have to be fair because it’s a cartoon. It wouldn’t be funny if it didn’t have some basis in reality, but I think it does have some basis in reality. All over the world, even as I type this, people are using hypothesis tests that ignore prior information.

I have to say that I am still baffled, after 20 years, at the very fact that there are “frequentists” and “Bayesians.” It’s as if physics had “Newtonians” and “Relativists.” Why not use the right approach and the right tool for the job, whichever toolkit it happens to be in?

Don’t try to answer that, if I don’t get it after 20 years I doubt I ever will.

16. Hey! I was kinda blindsided by the response to this comic.

I’m in the middle of reading a series of books about forecasting errors (including Nate Silver’s book, which I really enjoyed), and again and again kept hitting examples of mistakes caused by blind application of the textbook confidence interval approach.

Someone asked me to explain it in simple terms, but I realized that in the common examples used to illustrate this sort of error, like the cancer screening/drug test false positive ones, the correct result is surprising or unintuitive. So I came up with the sun-explosion example, to illustrate a case where naïve application of that significance test can give a result that’s obviously nonsense.

I seem to have stepped on a hornet’s nest, though, by adding “Frequentist” and “Bayesian” titles to the panels. This came as a surprise to me, in part because I actually added them as an afterthought, along with the final punchline. (I originally had the guy on the right making some other cross-panel comment, but I thought the “bet” thing was cuter.)

The truth is, I genuinely didn’t realize Frequentists and Bayesians were actual camps of people—all of whom are now emailing me. I thought they were loosely-applied labels—perhaps just labels appropriated by the books I had happened to read recently—for the standard textbook approach we learned in science class versus an approach which more carefully incorporates the ideas of prior probabilities.

I meant this as a jab at the kind of shoddy misapplications of statistics I keep running into in things like cancer screening (which is an emotionally wrenching subject full of poorly-applied probability) and political forecasting. I wasn’t intending to characterize the merits of the two sides of what turns out to be a much more involved and ongoing academic debate than I realized.

A sincere thank you for the gentle corrections; I’ve taken them to heart, and you can be confident I will avoid such mischaracterizations in the future!

At least, 95.45% confident.

• Andrew says:

Randall:

Yes, I think it makes a lot of sense to criticize particular frequentist or Bayesian methods rather than to criticize freq or Bayes statisticians. I don’t think anyone was offended by the cartoon, it just brought back some bad memories for me of people prejudging my work based on their crude preconceptions about silly Bayesians.

More importantly, now that both you and perhaps Scott Adams have commented on my blog, I am very happy. If only I could think of some way of getting Berke Breathed to comment here, I think I could just retire!

• Paul says:

Hear, hear!

• Phil says:

Next time make it Israelis and Palestinians, it’ll be less controversial.

• Larry Wasserman says:

Randall
I think we all loved the comic. But we statisticians are an argumentative lot.
But I owe you a huge thank you; my blog response, which I titled anti xkcd

http://normaldeviate.wordpress.com/2012/11/09/anti-xkcd/

I think I’ll give up posting on technical matter and try just posting comics.
On the second thought, probably a bad idea.

Best wishes
Larry Wasserman

• BenE says:

Randall, to see how contentious the debate can be, take a look at E.T. Jaynes ‘PROBABILITY THEORY:
THE LOGIC OF SCIENCE’ textbook. Get a taste of it from the free draft available here: http://omega.albany.edu:8008/JaynesBook.html

• stringph says:

Hey Randall –

I just wanted to check that I understood the comic correctly. In order for it to make sense, you need for it to be set on another planet which orbiting a star significantly heavier than the Sun in a late stage of main sequence evolution. Let’s call this planet Htrae. Furthermore, the lifetime of the Htraeian ‘stick people’ has to be only a few Htraeian days (which could be much longer than Earth days). So the chance of the star going nova within one ‘night’ would be significantly large, and the number of ‘nights’ on which the machine is activated in any one Htraeian’s lifetime is small. Only then does it make any sense to build a ‘detector’ like this. Having said which, it looks like a fairly cheap and basic device; I’d be prepared to shell out for the deluxe model with a larger number of dice.

• Antonio says:

+1

17. JS says:

But what if it was *three dice coming up all sixes??

18. […] Hacker News http://andrewgelman.com/2012/11/16808/ This entry was posted in Uncategorized by admin. Bookmark the […]

19. Gasp! A web comic made a factually inaccurate joke for laughs!

The humanity!

• Andrew says:

John:

I probably shouldn’t respond to your comment because I already addressed it, but your can’t-you-take-a-joke attitude annoys me so much that I will repeat what I wrote above:

My problem is ultimately not with the cartoon. My problem is that there are practitioners and teachers of statistics who spread cartoonish ideas about statistical methods without recognizing these ideas are inaccurate. See our discussion here and here, for example. I worked for six years in a department where many of the other faculty dismissed my methods without even thinking about them, and who told students not to take my classes. They acted like Bayesian ideas were some sort of malignant virus.

This new cartoon was getting a lot of attention so I thought this would be a good opportunity to clear away one bit of confusion regarding statistical methods. As the philosopher Deborah Mayo has discussed, there seem to be a lot of people who believe these cartoonish caricatures of frequentist statisticians. As a statistician who generally does not use such methods, I think I am well situated to offer this criticism.

To put it another way, Polish jokes can be funny or unfunny. But when the spread of such jokes reinforces people’s mistaken attitudes about Polish people, it can make sense to be literal-minded and point out the fallacies behind the jokes.

Humor is fine, but negative caricatures can do harm. I have experienced this myself. Had the cartoon been presented as “bad statistician” vs. “good statistician” or “rote statistician” vs. “sensible statistician,” I would’ve been fine with it.

• I think this is the first time I’ve ever seen someone try to make the claim of math racism stick.

I’m also hoping it will be the last.

What makes racism horrible is that it’s questionning a person’s innate character for something genetic – something that is neither a problem nor in any way voluntary.

To suggest that there are harmful stereotypes against a branch of mathematics that most mathematicians can’t even name?

It’s borderline vulgar to invoke something like that in that fashion.

This isn’t “can’t you take a joke.” That would imply that you have in some way legitimately been wronged, and were just comically thin skinned about it.

There is no legitimate offense here.

• Andrew says:

John,

Fair enough. No offense taken at the cartoon. I just worry that the cartoon, even though a joke, could reinforce some confusions regarding statistical theory and methods. As a teacher of statistics, I like to reduce confusion!

When I said that the cartoon could “do harm,” I did not mean that it would do harm in any malicious way, merely that it could spread confusion that would hinder people’s attempts to learn and understand statistics, thus doing some small indirect amount of harm in the world. Not the biggest deal out there, but I felt it was part of the “service” aspect of my job to clarify the statistical issues here.

• Bob Logan says:

“I just worry that the cartoon, even though a joke, could reinforce some confusions regarding statistical theory and methods.”

If only we inhabited a world in which widespread “confusions regarding statistical theory and methods” not only existed but could also be reinforced (or otherwise) by a fairly recherché cartoon. I’d posit that in such a world Karl Rove would not be able to find work.

“[I]t could spread confusion that would hinder people’s attempts to learn and understand statistics”

On the contrary, my stumbling on this thread has only increased my desire to learn more about competing schools of statistics theory, in part to try to understand why Randall’s cartoon has sparked such passionate debate.

• David says:

John,

Andrew originally shared his thoughts on his own blog, which is directed to a rather specialized audience. For that professional community, I’d say he’s more qualified than you to say whether unproductive stereotypes exist or not. I think it’s perfectly fine for him to bring XKCD into his own blog as a discussion aid, and I hope he does it again in the future.

The fact that the discussion has now somehow migrated here is neat in my opinion.

20. I wanted to pimp my own writeup on the topic and ask for this august group’s opinion, also on whether any of this may be publishable: (http://stats.stackexchange.com/questions/2272/whats-the-difference-between-a-confidence-interval-and-a-credible-interval/2287#2287)

One issue that I try to get at there, that I have not seen discussed quite the same way elsewhere, is the notion that confidence intervals make mistakes that are correlated in the output (e.g. you can have observations that produce nonsense confidence intervals 100% of the time), while credibility intervals’ mistakes are correlated in the input (you can have values of the parameter that almost always yield terrible posteriors). If anybody has references on this (or any other comments) I would be very grateful. Thank you!

• Corey says:

I think your writeup is excellent — so excellent that I wanted to give a presentation based on it (with full credit given to you) a few months back . I wanted to email you to ask for your permission, but I couldn’t find contact info for you. (I didn’t know about StackExchange’s message notification system at the time.)

• unclebulg says:

* The comment by probabilityislogic is spot on; it’s possible – and I think quite common – to be Bayesian and believe in a fixed unknown true parameter value. The prior (and posterior) describe what You know about this parameter value.

* The way you have constructed the intervals in the second table isn’t clear; it appears there is more than one way to fulfill your confidence criteria. If your intervals are not uniquely best, which intervals are better/best? This is important; Frequentists who want more than just adequate confidence (to get out of the trap xkcd illustrates) require a measure of efficiency, broadly.

* Intro decision theory classes would point out randomized rules, which can sometimes be optimal. I don’t think they’ll help your intended reader, but a referee might well bring them up.

* “Type B jars happen only 25% of the time”. No they don’t, if the prior’s wrong, as in the previous paragraph, so this doesn’t fit well into the argument you develop.

* “And I never publish nonsense”. Who mentioned publication? No sane reviewer would let a silly empty confidence interval pass, so this isn’t realistic.

* “I can afford to give nonsense for this outcome”. This is also just silly, in practice. Statisticians of any flavor getting nonsense for reasonable data at least suggest workarounds, if not entirely new methods.

* Can you include an “I don’t know” option? Bayesians can do this if they are allowed to refuse to bet, frequentists can state only that the null hypothesis was not rejected. When there is lots of uncertainty it can be the most realistic option.

21. Peter says:

Would it enlighten you even more, if you knew that the historic origin of statistical math science lies in gambling ?

22. John Goodwin says:

I believe there was a Star Trek episode (original series), in which a man who was half Bayesian and half Frequentist fought a man who was, contrariwise, half Frequentist and half Bayesian. The winner got \$50 bucks invested by xkcd in the 21st century.

23. I think the cartoon hit the nail on the head. Due to the “crud factor” the odds of some correlation in psychology being present due to crud is pretty high. Everyone uses p values in these situations. So unless frequentists start cutting swaths of death through the psych literature (and probably medical too) smashing nearly everything in sight, they are complicit in this state of affairs. The cartoon describes *precisely* this situation, where the dice are the crud factor (which is actually much higher than 1/36) and the sun going nova is some reasonably implausible hypothesis being tested.

24. […] Hacker News http://andrewgelman.com/2012/11/16808/#comment-109366 Zara This entry was posted in Uncategorized by admin. Bookmark the […]

25. Hui says:

OK, so then what method *would* a sensible frequentist use in this case?

• yonemoto says:

what astronomers do: Tally up all the stars of a given spectral class within certain parameters of the sun’s, use some sort of equivalence to ergodism (the counts over population reflect the counts over time), and then calculate the probabilty that the sun will blow up as a function of age/size/whatever, and then multiply by 1/36.

Of course, there are all sorts of bayesian hidden priors here, but the frequentist approach is to try to make those things as close to one (or zero) as possible. Because technically, rolling die and calculating frequentist statistics by tallying results is bayesian, because there is a “hidden prior” that “the universe follows the laws of statistics”, which we’re assigning a value of ONE.

26. Wayne says:

Seems to me that the cartoon is actually making fun of those who worship the p-value.

27. Stuart Buck says:

Not quite on topic, but this is another one of the good XKCD statistics cartoons: http://xkcd.com/552/

• Andrew says:

Stuart:

Yeah, that’s a good one. Here’s my discussion (with Eric Loken) of that problem.

28. robweiss says:

I do have 3 dice that usually roll all sixes. (Yes, they’re all 6-sided, and yes the 6 sides have from 1 to 6 pips on them.) I bring them out every time I teach my Bayes class.

• Where did you get those? I need some for (a) class and (b) Atlantic City.

29. Gray Calhoun says:

For what it’s worth, I’m a frequentist (Econometrician, not Statistician) and I thought the comic was cute.

30. If the Sun has indeed gone nova and you have apparently an hour left to live, the \$50 might just help the Earth move for you ;-)

Betting is a good strategy here. If the sun exploded obviously you have nothing to gain (or loose) so betting won’t hurt you. But if the sun didn’t explode you win.

32. Robert Feyerharm says:

So why not run the neutrino detector a few dozen more times to ensure the “Yes” response isn’t a fluke?

• Mark says:

How many times should you run it? A dozen? Two dozen? More? The number of times you should run before you really believe the result depends directly on your prior probability for the sun going nova.

• Robert Feyerharm says:

Certainly the more the better, and in truth a dozen measurements may take too long if it’s a life or death situation where minutes count.

I would be extremely worried if the detector output three “Yes” responses in a row, whatever the prior probability. What if something happened to the sun, and the prior probability of going nova in the next hour just increased from <<.000001 to .99?

33. Vince says:

The technical discussion following the cartoon is very informative. It suggests (to me at least), statistics should not be used by social scientists as a common tool of analysis. It is way too complex to be used correctly by non-statisticians.

• Mike says:

Only social scientists? Why not biologists, chemists, physicists or any other field outside of statistics? Statistics can be misapplied in those fields as easily as in the social sciences.

Could social scientists learn how to properly think about the probabilities and assumptions surrounding the statistical methods they use? I would say they could.

Also, what tools of analysis would you have social scientists use, if not statistics?

This is a rather poor generalization of social scientists.

• Robert Feyerharm says:

“Could social scientists learn how to properly think about the probabilities and assumptions surrounding the statistical methods they use? I would say they could.”

Agreed. For starters, I think a threshhold p-value of .05 is too high. Researchers should assume that there have been many prior researchers (100? 1,000?) who performed a similar test, obtained a p-value > .05, and decided not to publish. Maybe other factors should be considered as well – e.g., could lives potentially be saved if the research is made public even if the p-value=.04?

34. […] Frequentist vs Bayesian […]