Eric Archer forwarded this document by Nick Freemantle, “The Reverend Bayes—was he really a prophet?”, in the Journal of the Royal Society of Medicine:

Does [Bayes’s] contribution merit the enthusiasms of his followers? Or is his legacy overhyped? . . .

First, Bayesians appear to have an absolute right to disapprove of any conventional approach in statistics without offering a workable alternative—for example, a colleague recently stated at a meeting that ‘. . . it is OK to have multiple comparisons because Bayesians’ don’t believe in alpha spending’. . . .

Second, Bayesians appear to build an army of straw men—everything it seems is different and better from a Bayesian perspective, although many of the concepts seem remarkably familiar. For example, a very well known Bayesian statistician recently surprised the audience with his discovery of the P value as a useful Bayesian statistic at a meeting in Birmingham.

Third, Bayesians possess enormous enthusiasm for the Gibbs sampler—a form of statistical analysis which simulates distributions based on the data rather than solving them directly through numerical simulation, which they declare to be inherently Bayesian—requiring starting values (priors) and providing posterior distributions (updated priors). However, rather than being of universal application, the Gibbs sampler is really only advantageous in a limited number of situations for complex nonlinear mixed models—and even in those circumstances it frequently sadly just does not work (being capable of producing quite impossible results, or none at all, with depressing regularity). . . .

The looks negative, but it you read it carefully, it’s an extremely pro-Bayesian article! The key phrase is “complex nonlinear mixed models.” Not too long ago, anti-Bayesians used to say that Bayesian inference was worthless because it only worked on simple linear models. Now their last resort is to say that it only works for complex nonlinear models!

OK, it’s a deal. I’ll let the non-Bayesians use their methods for linear regression (as long as there aren’t too many predictors; then you need a “complex mixed model”), and the Bayesians can handle everything complex, nonlinear, and mixed. Actually, I think that’s about right. For many simple problems, the Bayesian and classical methods give similar answers. But when things start to get complex and nonlinear, it’s simpler to go Bayesian.

(As a minor point: the starting distribution for the Gibbs sampler is not the same as the prior distribution, and also that Freemantle appears to be conflating a computational tool with an approach to inference. No big deal—statistical computation does not seem to be his area of expertise—it’s just funny that he didn’t run it by an expert before submitting to the journal.)

Also, I’m wondering about this “absolute right to disapprove” business. Perhaps Bayesians could file their applications for disapproval through some sort of institutional review board? Maybe someone in the medical school could tell us when we’re allowed to disapprove and when we can’t.

P.S.Yes, yes, I see that the article is satirical. But, in all seriousness, I do think it’s a step forward that Bayesian methods are associated with “complex nonlinear mixed models.” That’s not a bad association to have, since I think complex models are more realistic. To go back to the medical context, complex models can allow treatments to have different effects in different subpopulations, and can help control for imbalance in observational studies.

Update (2014):

There’s something that fascinates me about these aggressive anti-Bayesians: it’s not enough for them to simply restrict their own practice to non-Bayesian methods; they have to go the next step and put down Bayesian methods that they don’t even understand. This topic comes up from time to time on this blog, for example in discussing the uninformed rants of David Hendry (“I don’t know why he did this, but maybe it’s part of some fraternity initiation thing, like TP-ing the dean’s house on Halloween”), John DiNardo (“if philosophy is outlawed, only outlaws will do philosophy”), and various others (the Foxhole Fallacy).

I was also inspired to write an anti-Bayesian rant of my own (with discussion), and Christian Robert and I considered anti-Bayesianism in the classic probability text of Feller and elsewhere (see the article with discussion).

Misinformed anti-Bayeisanism doesn’t look like it’s going away anytime soon, but on the plus side it seems to be moving toward the fringes. Bayes is here to stay, and I’m happy to see that non-Bayesian regularization is very popular too.

P.S. Just to remove any ambiguity here: I have no problem with *non*-Bayesians: those statisticians who for whatever combination of theoretical or applied reasons prefer not to use Bayesian methods in their own work. My problem is with *anti*-Bayesians who denigrate the Bayesian approach from a position of lack of understanding. As noted above, I’m happy to see that anti-Bayesianism has moved to the fringes, where it belongs. There’s always room in any discourse for a few extremists; it’s just not good if they have a lot of power.

…and then there are the

Bayesian-only-ies.The people who insist Bayesian is theonlyway to do things.Admittedly it’s a minority but still.

What about Bayesians who think the current pedagogical set up should be reversed: namely, Bayesian methods should be what everyone sees when they first encounter statistics, while p-values, CI’s, and null hypothesis testing should be relegated to late grad school?

Personally, I appreciate it when anti-Bayesians state their believes forthrightly rather than hide behind some faux eclecticisms. If they think Bayes is bunk it’s better to hear the why’s and wherefore’s.

I never hear anyone give by far the best reason for favoring an eclectic approach to statistics, which is that Statistics is in flux and will likely look very different 50 or 100 years from now. Maybe most don’t think Statistics will change that much, but if it does, then having one view totally dominate currently is far more likely to lock in errors than anything else.

It does seem like Frequentists are played out though. Their progress depends strongly on the intuitive ability of their creative practitioners to see what the right answer is (at least roughly). After a century of Frequentists making intuition fueled progress it looks to me like they’ve reached the limits of what their intuition can achieve and they’re relegated at this point to merely making conceptually small improvements on their current slop.

Regarding your third paragraph, see footnote 1 here.

It makes a lot of sense to me to teach Bayesian inference right after probability theory. Of course it remains very important to teach classical concepts as well, so that students leave the program equipped with all the knowledge they need to access the literature in which frequentist statistics abound. I always make sure to cover such things as “the correct interpretation of the p-value” even when teaching Bayes.

“Of course it remains very important to teach classical concepts as well”

Sure, if you were going to replace every introductory statistics course with a Bayesian version. But few people would support forcing everyone in that way. So what about simply having a dual track option which is the reverse of today? You have the option to go straight Bayes and only encounter classical stat much later and then only if you want to.

There are plenty of people who have no need for CI’s, p-values et al in their daily stat work and can pick up the jist of it well enough to understand Frequentists on the fly if needed.

Give such people an option to bypass the blah altogether. Of course, Frequentists might be afraid of giving students that kind of choice, but I for one can live with that.

Sincere question: What fraction of practicing statisticians have

“have no need for CI’s, p-values et al in their daily stat work”?Statisticians…. I don’t know, but let’s talk more about end users, Biologists, Chemists, Physicists, Engineers, Physicians, Ecologists, Economists, Psychologists… pretty much the only use they have for such things is to conform to the old-school expectations of journals. Not an ignorable issues, but not a fundamental one either.

If anyone chosen from those and similar groups knew pretty much only Bayesian statistics, they’d get plenty far along towards doing good research. In my opinion, with the advent of Bayesian computing languages like Stan and JAGS they would get a lot farther than they normally do by having the tools to build more complete models of their data.

You are talking of how things

oughtto be. Not of how things are.Currently, I suspect, that a Biologist trained without p-values or CI’s by his Stat. Courses would be quite annoyed & frustrated.

My point is that it may be gross exaggeration to claim that practicing applied statisticians

“have no need for CI’s, p-values et al in their daily stat work”Rahul,

See my comment below. I was talking about the current situation, not some future hypothetical one.

Currently Biologists just cringe when you mention statistics, and then maybe they calculate a correlation coefficient and confuse it with a linear regression coefficient and leave it at that.

The ones that introspect enough on the stats stuff generally wind up realizing that CIs and p values don’t answer the Biological questions they have, and they turn towards Bayesian stuff instead, typically at about mid-point in their career.

At least, that’s my anecdotal experience.

Rahul,

If you’re doing Pharmacology then maybe not many. But there are huge chunks of applied areas where such things never come up – even when Frequentist are the ones doing the work.

I knew a couple of guys that made tens of millions doing Bayesian Radar Target recognition for the military decades ago. They made that money because their methods were a big improvement over the Frequentist ad-hoc methods that had been used before.

But here’s the thing: neither group used p-values, CI’s or NHST. The entire Machine Learning world can get by without any of that blah as well.

So looking at the big picture there is a huge market for people who want to learn Bayes and never learn p-values, CI’s, or NHST.

Maybe. OTOH, the people wanting to do Machine Learning or Radar Target recognition do not usually turn to Statistics Departments to get their training. I might be wrong.

Most of the AI / Machine Learning guys I’ve met have a different pedigree.

The Radar guys were Ph.D.s in Statistics. Machine Learning types usually aren’t statistics folks, but Bayes is a common standard tool in Machine Learning whereas the Frequentist stuff isn’t.

Hence the demand to learn Bayes and bypass p-values. There are plenty of other examples as well. Those are just two that I’m familiar with.

Plus there is some segment of the population which really should learn the p-value, CI, NHST stuff but just doesn’t want to regardless of what consequences it has for their career. I’d have gladly taken that route if it was available.

Life is way to short to calculate, let alone explain or understand, Confidence Intervals.

There is the “Classical” (Named so even though Bayesian was first) vs Bayesian controversy. Within “Classical” there is Frequentist (Neyman-Peason) vs Inductivist (Fisher) controversy. Then there is the NHST hybrid built on top which takes error rates and pre-study significance levels from the frequentists, but null hypotheses and arbitrary p-values from Fisher, which are then interpreted as either/both error rates and posterior probabilities by the vast majority of statistics users and audiences. Are there any further sources of confusion you guys can add?

Information-theoretic approaches for model selection and multimodel averaging, which are reasonable approaches as far as they go, but which many ecologists have accepted as replacements for NHST while using them to do hypothesis testing by the back door …

I’m not a statistician, but I get annoyed with the breathy touting of Bayesian methods. For instance, pushing the teaching of it to introductory students.

Everyone who passes Calculus is a Statistician.

Nony: If its done properly, I believe it would be a major improvement – with no need to bash frequentist “best alternatives”.

As Rubin once put it – the best way to understand frequentist techniques is the think them through using Bayesian models/thinking.

But I am also often get annoyed with the breathy touting of Bayesian methods to non-statisticians or even practicising statisticians outside academia – I believe that does a lot of damage.

Maybe we should call that misinformed pro-Bayeisanism?

The thing about teaching Bayesianism to intro students is that ultimately there is a big gulf between the meaning of Bayesian probability and the meaning of Frequentist statistics… and if you don’t start out with this laid out ahead of time, it’s a major stumbling block to understanding later.

There’s nothing really fundamental about Frequentist statistics, the fact that it was the first area to be developed for applications widely does not make it some kind of essential basis for statistics… after all Ptolemy had an earlier version of mechanics than Newton, but we don’t teach it today just because it was first.

In my opinion the way to teach statistics is to start off with a discussion of what probability is about as a model of something. Does it model our degree of knowledge about a situation, or does it model external forces that cause “randomness” in the outcomes of experiments, or does it only model the process of selecting subsets of finite populations?? etc.

Once we acknowledge that different views exist, we can move forward with teaching how each one works. In my opinion Bayes deserves at least 1/2 of the intro course once that context is given.

Daniel,

“…the way to teach statistics is to start off with a discussion of what probability is about as a model of something.”

I think this is the right place to start thinking about this, but I want to re-phrase it, because I think doing so reveals the fundamental difficulty facing students (and us):

Bayesian and Frequentist are not different “kinds” of statistics, they are different metaphysics of statistics. By that I mean, they ground statistical concepts in different metaphors for how the world works. Personally, I would love to take a class that started with “Here are two fundamentally different metaphors for how statistical inference is grounded in the world, and here is how the assumptions that are distilled from these metaphors play out mathematically in estimation.” But I think that is a hard sell for students, and not just undergrads. Most people, in my experience, deeply believe that “science” is a thing that grasps on to the real world. When you start by saying “here are two totally different conceptions of that world, and you have to choose when/where one is applicable or not”, they get all freaked out. Even Philosophy courses, which are ostensibly about learning to critically engage with many different ways of interpreting the world, tend to get bogged down in arguments between “camps” looking for holes in the other approach, instead of engaging with the ideas themselves.

None of this is an argument against your teaching method per se, but it is a sort of pragmatic pedagogical concern, and those are (probably) valid kinds of concerns to have. You think its better to have students who can think about the grounding of probability in the world carefully, or can do some basic statistical manipulations? If those students are going to be advanced practitioners at some point, maybe the former. But if they are going to read newspapers, do some Excel at work, and possibly read some science journalism, maybe the latter. Yeah?

jrc:

“When you start by saying “here are two totally different conceptions of that world, and you have to choose when/where one is applicable or not”, they get all freaked out.”

Exactly what happened in my webinar for Epidemiologists using an (almost) math free introduction to Bayesian statistics.

However, it did not happen in the earlier webinar for ASA members – most (with little to no experience with Bayes) were happy with only a couple complaints about not enough math :-(

Given there is some interest in such material, I’ll forward some of my slides and R code to Andrew if he will be kind enough to share.

This gives a high level description and some links to material.

http://magazine.amstat.org/blog/2013/10/01/algebra-and-statistics/

I remember encouraging my sister to take a statistics course at a junior college when she was on leave from Reed college, she was interested in Psych and so anyway she took the course, and she got an A, but she got an A mainly because she refused to use the stupid graphing calculator software and ground out the CIs and t tests etc by hand so she knew the formulas backwards and forwards and could apply them on the test.

She later went on to do a BS in Psych at Reed (with some SPSS stuff in her senior dissertation) and then went into a nurse practitioner program and is now treating people with serious mental illness issues. She’s a perfect example of someone who isn’t an advanced practitioner (of statistics) but who might be a good consumer of statistical information (say about drug outcomes and side effects and things).

Did she learn much by being able to grind out t-tests and CIs in those intro courses? I would say no. I doubt very much if she could interpret a confidence interval properly (as evidenced by the number of RESEARCH level people who have the same problem, discussed elsewhere in a recent post…). I doubt that she could interpret anything but the simplest linear regression with any degree of confidence in her own ability. To be fair, she isn’t primarily interested in modeling and data analysis. She’s quite intelligent and would undoubtedly do well if she had chosen to study those things in further courses.

As for her current fluency with CI and t-test calculations. I doubt very much that she could carry them out today, so she’d have to look them up, and I doubt very much if she could feel confident in knowing when to apply a t-test for example… so in the end she didn’t learn something of lasting value from grinding out confidence intervals etc in that first year course.

what would have been of lasting value, and what was I HOPING she would learn in that stats course when I recommended it to her?

1) a decent and intuitive sense of how variation affects our ability to measure and understand and compare things, variation from person to person, from group to group, from location to location, and etc

2) a sense of the limitations of prediction from data both in terms of approximation with residual noise, and how sample sizes and the number of sub-groups might affect the information content of experimental data.

3) some ability to collect data in a computer and graph it, do basic calculations with it. Most likely this would have been something like Excel but these days I’d hope we could get students doing exploratory analysis in R.

4) An understanding that there might be more than one way to describe things: a sense of what the creative process of modeling is about. How do simple modeling choices affect our outcomes.

5) A sense of how to design an experiment to collect data on something of relatively moderate complexity, maybe data you’d analyze with a multiple regression with 3 predictors or something, say BMI as influenced by age, sex, and smoking status or something.

I realize it was naive for me to think she’d get that kind of thing in a JC 1st semester stats class, but I don’t think it’s naive to say that she COULD have gotten that kind of stuff in a realistic course that was properly designed without the baggage of historical “we have to teach kids this stuff because we taught kids this stuff 20 years ago and now that’s what the current researchers do”…

Do we really want “informed citizens” to basically think like: “stats is a whole shitload of grinding out numbers that are completely opaque according to some rules I couldn’t really understand, so I just trust whatever a professional statistician says in the newspaper” because I think that’s what we’re producing with Stats 101 in the vast majority of cases.

Marine, I’m not normally a grammar nit, but your capitalization is off.

Others: You show a total disregard for pedagogy and a lack of love for students. You don’t savvy the needed insights of basic students (what they really need to do and where they are starting from). You remind me of the chemistry profs who advocate descriptive chemistry for first years instead of the baby p-chem approach. Or who want to teach calculus and diffeq’s with computer programming instead of as a set of rules, tricks, etc. At the end of the day, you don’t care about your students or really understand them. You just think about what you like. Well what you like may be totally bad thing for an intro student.

And then that you can’t learn conditional probability without studying unconditional. Bleh. What a joke. Certainly easier to start with simpler picture and complicate later. Or do you start 11th grade chemistry class with Schroedinger’s equation?

Sure, because NHST, confidence intervals, and p values are so intuitive and simple for students to understand… that’s why professional statisticians regularly have long debates about them on this blog, and why medical doctors don’t understand the difference between standard error of the mean and standard deviation of the population distribution (see comments here: http://andrewgelman.com/2014/03/15/problematic-interpretations-confidence-intervals/)

The fact of the matter is that there are two interpretations of what probability is about. One of those interpretations (Bayes) is normally completely untaught to students, and yet it is the interpretation that most students naturally give to the Frequentist confidence intervals and things that they are taught. Consider that when you try to determine what students will find intuitive and what they will find completely impenetrable

All the data I have points to the majority of professional scientists (non-statisticians) who use statistics in their research and were taught statistics using the standard intro courses about NHST and soforth completely missing what it’s about and treating it like a cargo-cult gatekeeper for scientific publication.

I second Entsophy and Daniel’s responses to Nony.

As it happens, today was the day that I started covering sampling and estimation in my intro to probability and statistics course for undergrad economics majors. There’s no way around it: repeated sampling is incredibly awkward to teach and very difficult to understand, even for the sharpest students. The only method I’ve found that seems to get the point across is to have my students carry out simulation experiments in R. Even still, they typically object when we start constructing confidence intervals using real data. Students find it strange that we justify a procedure for fixed data using a thought experiment that involves repeated sampling, even if they understand how the thought experiment operates on its own terms.

In contrast to repeated sampling, teaching Bayes’ Rule and conditional probability is a piece of cake. Pretty much everyone “gets it” because the examples are so compelling and we can use trees and Venn diagrams to illustrate the conditioning. I think it would be great to teach a Bayesian intro course. We could cover much more interesting models and students’ natural inclinations would work with rather than against them. The problem is really other people’s expectations of what should be in an intro course. I end up having to teach something like “defense against the dark arts.” I say things like: “you need to understand hypothesis testing because it’s widely used but here’s why it’s often a pretty bad idea.” The natural question, of course, is “well what should I do instead?” But since it takes so much time to teach the frequentist material, there’s just not enough time to talk about Bayesian alternatives.

So. True. I’m going to start teaching introductory stats course next week and I just fear the question: “Why can’t I say the CI contains the true mean with 95% probability?” I believe I could explain that, but the follow up question will be: “Why would I ever want to look at a CI then?” …

My Stats teaching experience was horrendous, because I just flat didn’t believe any of the crap I was saying. I swore I’d never teach another stats class again as long as I lived. Because of that little experience, I don’t believe anyone should be put in the position of teaching something they think is crap.

But I don’t see why there can’t be a parallel all-Bayesian tract for those who couldn’t care less about p-values, CI’s and so on. There’s definitely a market out there for that kind of thing.

Oh yeah, and the parallel tract should just be called “statistics” without any qualifiers.

Yes! It has been said many times before, but “Bayesian statistics” is not a good name. (Not to mention that Bayes and Bajs (Swedish for poo) sound extremely similar). It makes it sound like it is some specialized statistical methodology (like mulivariate statistics, non-parametric statistics) something for statistic nerds with too much time on their hands and certainly not for undergrads…

I teach CIs that way (that is, properly) and the follow up question is, indeed, “what? Why do I care about that?” I teach a Bayes lecture where I try to clarify the difference, but the curriculum needs a complete overhaul to avoid these sorts of confusing situations.

Nony,

There is no way that a ground up Bayesian introduction to Statistics is more complicated or sophisticated than a Frequentist one. Absolutely none. It’s conceptually and mathematically simpler at that level and students take to it far easier than CI’s and whatnot.

Moreover, when it comes to “failing to live up to the hype” Classical Statistics is the all time champ. Go back and look at the way people talked about classical statistics from about 1940-1970 or so. It was far more oversold than Bayes ever was.

The current set up is a proven disaster. Anyone insisting on the status quo currently is taking the radical position.

Hmm…

I’m a bit of a weird case and certainly not representative of the general student.

But at the intro level I found both the classical and the Bayesian approach extremely confusing and ultimately unconvincing.

To anyone who is used to the more rigorous undergrad math classes, classical statistics I think will feel quite jarring. It is choke full of words that, despite being very suggestive, don’t really mean much (“confidence”, “likelihood” etc…) and it feels extremely disjointed and ad hoc. Every problem seemingly requires a different procedure and not even the criteria of what constitutes a “good” test seem very consistent (for example: sometimes unbiasedness seems to matter a great deal, othertimes we apparently don’t care at all – I’m sure there is higher logic to that but I certainly didn’t get it)

Bayesian statistics then, at first, seems extremely welcoming, because it gives you this wonderful unified methodology, which can so easily be extended to any kind of problem. But, it comes at a price, and that is the dark art of choosing a prior. Now I’m sure that in higher classes you learn how to choose priors well, but at thze intro level what you usually get is either “choose a uniform/flat prior to let the data speak for itself” (to which I respond: why a

*are you a Bayesian?) or “choose a conjugate prior, so you dont have to do a lot of math” (to which again I respond: why are you a Bayesian then?). In general, it seemed just too jarring for me to have an approach that relied on using your prior information (which is a great idea) but then instead told you that you should just do whats convenient to the problem. Now, I realise that if my sample size is big “enough” and my prior is flat “enough” then it won’t matter, but in that case I might just work out a confidence interval using the classical method, rename it into a credibility interval, create the method of “bayefrequenism” and call it a day.

Maybe if you focus a class solely on Bayesian methods you can spenda lot of time on choosing priors very well, but i think students will have as much difficulty with that as they have with interpreting CIs, because it just seems so much more art than science

Anonymous:

Regarding “the dark art of choosing a prior”: Let me tell you about the dark art of choosing a likelihood function.

It’s true though. Choosing a prior is a weak spot. Agreeing on a consensus prior is extremely hard.

IMO a key point of meta-confusion is that statistics students and consumers is that they simply expect statistics to do far more than it can. Classical statistics really can’t offer simple, rote-applicable, assumption-light, “objective” answers to many of the actual questions you’d actually want to answer, even though it’s sometimes packaged this way. (Hence the games with terminology to make it seem like you are getting something a bit different than you actually are.) Bayesian statistics is arguably more transparent in what it actually tries to tell you, but that just pushes the dirt around somewhat: the whole prior business, for instance, is arguably ugly and unclear (“dark art” is not wildly unfair).

So if an introductory course – Bayesian or classical – leaves a student thoroughly _satisfied_, I’d tend to be rather suspicious. Against reasonable and common incoming expectations, I’d expect a good course to leave a good student significantly disappointed/frustrated one way or another.

I just don’t see the choice of prior as all that hard in most cases, In some high-dimensional cases, like priors on functions and soforth it can be tricky, but for the most part frequentist models involving distributions over functions are a much worse horrible mess.

The key is to understand that a prior simply encodes which values of parameters you think are relatively more reasonable compared to others. It has nothing to do with the frequency of anything observable and there are many different priors which are valid and some of them are more informative than others.

If the results of your analysis depend strongly on your choice of prior, this is basically informing you that you need more data or data of a different kind. In cases where only extremely sparse data are available, frequentist methods will tell you nothing much of interest, yet you may have to make important decisions based on only that scant data… so your priors will be important, as they should be. It’s the machinery telling you that you cant get something for nothing.

In most problems I work with, the choice of prior doesn’t matter much as long as the prior is reasonable. The “8 schools” problem is a good example, you can us a t3 distribution instead of a normal, or put different limits on the prior for the within-school variance, and unless you do something crazy you’ll end up with about the same answer. Not all problems are like that, but most problems I encounter are like that.

@Phil:

If your models are relatively insensitive to your choice of prior, is there still a big advantage in going Bayesian? I’m not sure. This is a point that confuses me. If you have a model whose outcome does not care much about the nature of the prior information going in, what good is the Bayesian nature of the model?

To clarify: I’m no anti-Bayesian. I just love Bayesian models in settings where the prior can be reasonably well and directly be determined from the data at hand. e.g. some machine learning, pattern recognition, or similar models.

What puzzles me is Bayesian models where there’s no consensus on a prior nor a good way to “automatically” determine one. There I’m confused about their utility.

Rahul, the limit to comment nesting means I can’t reply to you below, but this is a reply to your question: what good is bayesian modeling if it doesn’t depend strongly on the prior.

I think the biggest advantage is that Bayes gives you an in-principle standard procedure for doing inference on any type of model with any structure at all… In particular, the hierarchical modelling in Bayesian inference gives you huge utility. here the details of the prior may not matter so much, what matters is that there IS a prior for the hyperparameters, this connects them together in a way that allows sharing of information between groups. And you can tune the nesting structures to meet the needs of what you know about the data and the processes that generated them.

So, ultimately to me the biggest practical advantage to Bayesian statistics is that it gives you a way to work with models that are meaningful and tuned to the particular problem at hand in a very general way. Frequentist methods seem significantly less flexible in doing this.

I was talking about people who encountered Bayes before ever hearing about classical statistics. If you do it reversed you’ll spend 90% of your time struggling to undo the damage from the muddle that is classical statistics.

“For instance, pushing the teaching of it to introductory students.”

I agree. Physics students should also be advanced at aristotelian mechanics, at least up to a third year level before we introduce them to the more challenging Newton’s laws.

I would have love being tough Bayesian statistics in my first statistics course! Instead I spent years being confused by classical statistics (still being a bit confused to be honest). Personal opinion: I don’t see why having Bayesian statistics in an introductory course would be less confusing than frequentist statistics, given the right teaching material.

“given the right teaching material.”

This is crucial. Most Bayesian inference courses at the moment are aimed at senior undergraduates and above, and assumes they’re confident with mathematics and classical statistics. Obviously that wouldn’t work. However I don’t see why basic discrete parameter inference (that conveys the fundamental ideas), and some elementary BUGS, would be beyond first years.

It’s not about what you would love!!! That’s the problem with you people (said with a Ross Perot tone). It’s about introductory students.

And you don’t need to do repeating sampling in R. Just deal hands of cards. Shuffle. Re-deal. We get the picture…and it’s more tactical and fun.

Actually, I used both a “real” experiment with coins and a simulation of repeating sampling in R. It was meant to demonstrate the law of large numbers, though. I’m pretty sure both together were useful; cards and coins are nice as an example but they might be difficult to translate into the kind of data and analysis students have to understand and use themselves later. This was in a tutorial for the introduction into (applied) statistics course for sociologists and political scientists, though. I think I also managed to explain to most of them what CIs and p-values actually mean and the exams as far as I could see them were mostly correct in this areas. Nonetheless, it’s correct that students intuitively assume that a level of significance of 95 % means that their estimations or their model is probably correct by 95 %… However, I don’t think Bayesian Statistics will help with all the confusion. Statistics are confusing and we are intuitively quite bad in assessing probabilities. I don’t see how this would be different for Bayesian Statistics?

I’m also not sure about exchanging frequentist statistics for Bayesian statistics in introductory stats courses. At least not for the social sciences. In Germany students actually need to learn frequentist statistics first. In their studies and in the most fields they might need to use their statistics knowledge later in their life, they have to know some standard analysis techniques and need to be able to understand the basics of classical inference. I didn’t even encounter one paper using Bayesian Statistics in my whole B.A. and M.A. studies in Sociology and I took every course even weakly oriented towards statistics or empirical research. Only due to my B.A. minor in Mathematics and the Data Analysis book from Andrew I’m now starting to become more and more interested in Bayesian Statistics. Thus I think at least in the social sciences there is not really any way to teach Bayesian Statistics in the intro courses but it would be helpful to have at least advanced courses about Bayesian Statistics… But then I’m not sure if this would be my primary concern. There are important fields not tackled in the education which might be more important (Data Management, Missing Values, Non-Linear Models and especially Hierarchical Models). And I’m not even speaking about the problems of linking theory and empirical research. My first idea for an advanced course I would teach, would be “Interaction and Contextual Effects in Theory and Empirical Analysis” or something like that, as this is a) something highly imported for sociological research, I actually don’t think that there are many interesting social phenomena where interactions and/or contextual effects play no role and b) Interactions are often misinterpreted even in publications (Interpreting Coef as marginal effects e.g.) and contextual effects probably need something like hierarchical modeling which nobody learns in the course of their studies here (if not by themselves) and I think someone who specializes in quantitative methods should at least be able to read/interpret hierarchical models.

Nony and Daniel:

You can actually introduce and do Bayesian analyis using a mix of dice and playing cards to represent a prior and a data generating model. (I did it with a group of civil servants as my example of a training exercise in a teaching course, and it was assessed as successful.)

But it misses or skips over the (more important) issue jrc raised above http://andrewgelman.com/2014/03/18/wacky-anti-bayesians/#comment-155476 in that the dice and playing cards represent themselves (as a game) and not something else of empirical interest.

Sure: I do physical simulation experiments in class too. (Actually, we did Andrew’s candy example in clas today!) But if you want to look at the coverage of a confidence interval it starts to become a little unwieldy. You certainly couldn’t repeat the experiment enough times in class to show that the procedure “works.” This is where I have the students go home and code it up in R. (A valuable side-effect is that it really forces them to make sure they understand the formula for constructing the interval.)

When I said that “I would love to teach a Bayesian Intro Course” I meant “based on my experiences teaching undergrad introductory statistics, I think it would be highly desirable to teach a Bayesian Intro Course because the students would learn more and be less confused.” My motivation is pedagogical.

There are many ways to unpack the phrase “what intro students need.” For one thing there are lots of different kinds of intro students: some will never take another statistics course and others will go on to graduate study in. There’s also the question of what other courses will expect intro students know, as Daniel mentions below. And here we have a kind of vicious cycle: students study frequentist statistics, students become researchers and use frequentist statistics, researchers teach undergraduates and expect that they know … frequentist statistics! My view is that we’re stuck in a bad equilibrium here, but even if you disagree it seems pretty clear that there are historical reasons for why we teach what we do that don’t always align well with what’s most useful for students to know.

I guess the weak version of my claim is merely that it would be great if there were more experimentation in the teaching of intro statistics but it’s not clear how to achieve this while leaving the system of prerequisites as is.

Frank, I think, we should probably distinguish between different kind of “intro courses”. Although I personally would have liked a highly mathematical and rigorous Statistics course even in Sociology, that’s probably not the best way to go. From your website I assume that you are teaching econometrics? How did the students react to the candy example?

Anyway, there are at least two different kind of introductory statistics courses, we might want to label them “mathematical statistics” and “applied statistics”. Mathematical statistics usually can work on basic knowledges in Calculus and (Linear) Algebra, while the applied statistics courses cannot assume much if any mathematical background. There are of course some fields in-between, e.g. Economical Mathematics (“Wirtschaftsmathematik”), where you have some mathematical background but probably not as much as pure mathematics students have. However, the question is: What do we actually want the students to learn in an applied statistics course? I think the primary aim is to gain some critical understanding of statistical concepts, the ability to read and interpret empirical research papers and give some background for further courses where they will learn how to do some analysis themselves. As in most fields most papers and researchers are using frequentist statistics there is no way how we could ignore them even if we want to. At the time our* approach is to teach simplified frequentist statistics with a very short introduction into the meaning of probabilities and the idea of repeated sampling and frequentist inference. Afterwards an introduction into t-tests, ANOVA, linear regression, factor analysis and crosstabulation (and a very short intro into logistic regression) follows. I think nobody here actually believes this simplified version of frequentist statistics but it’s meant not to confuse the students too much. Students who actually want to do empirical research then can take a few advanced undergrad courses (Regression Analysis and Graphical Presentation with R or Stata in the last semesters, courses oriented on a research topic where the students need to analyze some survey data**). They actually haven an SPSS course before they take the stats course but I’ve no idea why anyone would do it that way.

So for applied stat courses like for sociologists, political scientists, psychologists and maybe also for economics, what do we actually want to accomplish with our intro courses? And how would it help to include Bayesian Statistics in them?

*It’s not really my decision but anyway…

**There has also been an advanced course that included some basis linear algebra and taught for example how factor analysis works but I think the course is no longer taught.

> enough times in class to show that the procedure “works.”

Just takes some creative thinking, in my class at Duke (~ 100 students) I printed out summary statistics each drawn under same model (and printed on a separate sheet). Hand them around, they calculate CI raise hand if _true_ value in CI.

(p.s. at the time I was not _allowed_ to ask them to use software to do this.)

In the Bayes thing above, groups of 3, one was a _god_ who drew the truth randomly from the prior and generated original data and showed just the data, then another repeatedly drew from the prior and the other generated data given that prior draw and if it matched the original data that value drawn from the prior was kept to get the posterior – at end posterior distribution was check to see if it was in the posterior. (Loosely from memory.)

But to go much further, something like R would likely be necessary.

There’s always room in any discourse for a few extremists; it’s just not good if they have a lot of power.Spoken like a Bayesian :)

I wonder if Gelman’s current Bayesian sensibilities would have been considered extremist in say 1950? How about 1995 Berkley?

As I mentioned in a blog post a couple of months ago, Bayesian statistics was considered some combination of stupid, wrong, and useless by many UC Berkeley statistics professors in the early and mid-1990s. Andrew’s work was not respected there, just because no Bayesian methods or approaches were respected. Andrew can tell you too, but I saw it firsthand. It was astonishing.

He who laughs last laughs longest.

Well hopefully that Gelman fellow can land on his feat.

I’m not sure it makes a whole lot of sense to talk about extremists in a partly mathematical field the way it does in politics or religion. Statisticians make claims most of which are either write or wrong. Whether they’re unpopular or radical is besides the point.

It’s the ignorance that grates, and not the anti-Bayesian itself. In the past it was far easier for anti-Bayesians to spout wrong claims which had already been corrected many times, without taking the slightest trouble to learn anything more about Bayes, and without receiving much censure of any kind. Now it’s not so easy for them to get away with this.

A big chunk of this is just communication and time constraints though. Just about everyone, including Frequentists, complain about the same lazy behavior on the part of their advisories.

I always consider it important in discussions like these to distinguish between the level of explaining what probabilities mean, and the level of methods of inference.

My impression is that most people who apply statistics have a frequentist idea of what probabilities mean, even many of those who prefer Bayesian methods (there is quite an amount of evidence for this in the Bayesian literature where people interpret their likelihoods; and Andrew contributes to this with his ideas on model checking).

But then a prior probability distribution on the set of parameters is conceptually something very different (except in “empirical Bayes” situations where even parameters are in some way drawn from a population), and indeed adds a layer of difficulty to stats teaching.

That’s right almost everyone does think of likelihoods as frequencies, but not everyone.

http://www.entsophy.net/blog/?p=241

“it’s not enough for them to simply restrict their own practice to [Bayesian] methods; they have to go the next step and put down [non-] Bayesian methods that they don’t even understand. This topic comes up from time to time on this blog”

– Edited to reflect this comments section :-)

“put down [non-] Bayesian methods that they don’t even understand”

Bayes currently is taught as a voluntary addition on top of years of Classical Statistical training. Consequently, all Bayesians understand classical methods.

I think you’re exaggerating. In many MS & PhD programs learning Bayes is not voluntary, and hasn’t been for a long time.

Also, training in classical statistics for years doesn’t, unfortunately, mean that one necessarily understands it. Even if one later goes on to prefer Bayesian methods.

Many (most?) introductory statistics textbooks (even those that assume a knowledge of calculus) barely acknowledge the existence of Bayesian methods. Pity that, since being exposed to different schools of thought as this early stage should actually be beneficial to most students.

When learning an all new concept and one that is “hard”, it is not beneficial to be exposed to two different frameworks. Have you ever been an intro student? A non brilliant one? worked with a class of them? You don’t teach someone learning back handsprings, two different methods for roundoffs. You give them one. They are beginners! Have some empathy, have some clue.

They should only learn one. And since the Bayesian version is simpler mathematically and conceptual (to anyone who has never encountered p-values and CI’s before), then they should learn the Bayesian one.

Well I went to a top 10 stat grad school and the only “requirement” to learn bayes was a third year “statistical inference” sequence which was 80% Frequentist and 20% Bayesian. That sequence wasn’t required for MS but I believe was for Ph.D.s. That despite the fact that 50% of the profs were Bayesian.

No one understands classical statics because it’s not understandable. But Bayesians can apply frequentists methods and are familiar with all the key theorems, and more importantly, halve almost certainly done a great deal of hands on classical statistics.

This might all change in the future (and easily could), but as of right now Frequentists as a whole have nothing even approaching that level of familiarity with Bayes, hence the existence of strong anti-Bayesian diatribes from people know very little about it beyond some three page philosophical discussion they read somewhere.

“No one understands classical statics because it’s not understandable.”Brilliant! Andrew should steal this tagline.

Anyways, the best statisticians I’ve met or read never bother to label themselves as either Bayesians nor Frequentists and just move on doing what they have to do; ideology be damned.

This was me commenting; forgot to type my name. Sorry!

You went to one top 10 grad school, years ago, and not the other nine. You’re not even sure that the Bayesian material was required for PhDs at your school, back in the day.

Currently ASA specifically includes Bayes in its undergrad curriculum recommendations and notes how useful Bayes is in practice for MS statisticians.

Can we have this argument later, when you have some actual data?

I take it back. Your right.

I meant ” you’re right”

[…] Someone named Daniel wrote: […]