Bayes at the end

John Cook noticed something:

I [Cook] was looking at the preface of an old statistics book and read this:

The Bayesian techniques occur at the end of each chapter; therefore they can be omitted if time does not permit their inclusion.

This approach is typical. Many textbooks present frequentist statistics with a little Bayesian statistics at the end of each section or at the end of the book.

There are a couple ways to look at that. One is simply that Bayesian methods are optional. They must not be that important or they’d get more space. The author even recommends dropping them if pressed for time.

Another way to look at this is that Bayesian statistics must be simpler than frequentist statistics since the Bayesian approach to each task requires fewer pages.

My reaction:

Classical statistics is all about summarizing the data.

Bayesian statistics is data + prior information.

On those grounds alone, Bayes is more complicated, and it makes sense to do classical statistics first. Not necessarily p-values etc., but estimates, standard errors, and confidence intervals for sure.

21 thoughts on “Bayes at the end

  1. Perhaps people need some sense of dealing with batches of numbers (distributions) and how probabilities facilitate that before dealing with batches of probabilities.

    But the prior * data probability = posterior then marginalized to what you are interested in to get an interval (always the same _recipe_)

    seems that should be easier

    But definitely coverage of the intervals over repeated use for various distributions of true parameters – i.e. confidence interval type properties of credible intervals – does seem the way to go.

    Again, it would be nice to have evidence…

    K?

  2. As a newbie in Bayesiana Statistics (I'm finishing my PhD in Political Science using Hierarchical Bayes) I have to say that I never understood well regression models until I started learning Bayesian inference.

    In Bayesian modeling, it's all about probability. You must have some knowledge of probability distributions, conditional probability, joint probability etc.

    I mean, it seems more natural to think of
    y = a + bx + e, e ~ N(0,1) as:

    y ~ N(a + bx, 1). But once you are here, it's quite easy to extend to hierarchical Bayes, while in the other setting is pretty annoying. You have to think in terms of fixed effects, random effects etc. which is confusing.

    In conclusion, maybe it's easier to learn frequentist first and Bayesian later, but more complex models (and here i'm thinking of simple regression) are easier in Bayesian settings than frequentist ones.

  3. Jared: Good catch. That was the book.

    I was also surprised at the size of the book. I've only skimmed it, but it seems to have as much substance as some larger contemporary text books without being difficult to read.

  4. I found it much easier to understand Bayesian posteriors than confidence intervals. I think the problem for me was that frequentist treatments of estimation in texts tend to be organized around hypothesis testing rather than inference. Getting your head around multiple repetitions of experiments requires a conceptual backbend to understand what's actually a pretty simple concept.

    I think it's critical to understand sampling variance first, though. It's actually pretty easy with a fistful of dice and some paper. Or a computer. Judging by the urns (OK, pretzel jars) of (ping-pong) balls and bags of dice in Andrew's office, I'm pretty sure that's what he does.

    I really liked the approach at the start of Jim Albert and Jay Bennet's book Curve Ball (which was aimed at layman and very light on formulas). They start with simulations from baseball games like All Star Baseball (the one with the spinner). Seeing 20 simulated seasons for a batter with a 0.312 batting ability is very enlightening if you've never thought about sampling variation before.

  5. "On those grounds alone, Bayes is more complicated, and it makes sense to do classical statistics first."

    At MCMSki 3, Sander Greenland claimed that the only effective way he has found for conveying Bayesian ideas to students is to teach them simultaneously with frequentist ideas. He specifically mentioned being somewhat shocked that AG, the author of the best-known Bayesian text (or words to that effect), told SG that AG teaches frequentist techniques* first.

    * I know that AG doesn't think it makes sense to call estimation techniques per se "frequentist". I'm just reporting what SG said.

  6. Corey:

    I do teach Bayes in my intro class now! Because Sander convinced me (when I ran into him at a meeting a few months ago). Here's the order of topics:

    – Probability distributions
    – Law of large numbers and central limit theorem (stated, not proved; this is a non-calculus-based course)
    – Estimates and standard errors
    – (Classical) confidence intervals
    – (Classical) hypothesis tests
    – Bayesian inference

    Following Sander's advice, I teach Bayesian inference as a weighted average of prior and data estimates. I don't teach it as conditional probability, and I don't make the connection between base rates and prior probabilities. I just can't figure out how to work it in.

    But I do feel the need to teach estimates, standard errors, and the central limit theorem first.

    I do a bunch of simulations, but seeing Bob's comment above makes me think I should be doing more. The trouble is that most students don't have any particular intuition as to what a .312 batting average is, so I'll have to come up with more convincing examples. I'm still working on it!

  7. I'm just struggling to understand how this can be. N.b. I've read a few of your books including "Teaching Statistics".
    In an intro class, where you are concerned about teaching even conditional probability, you nevertheless include (as priority #5) classical hypothesis tests!?! Do your students really come away with a useful ongoing understanding what "accepting" or "rejecting" a hypothesis in this sense means? Forgive me, I've met neither you or any of your students, but my sample is dozens of masters-or-higher statistics graduates and relative to my experience this seems an incredible achievement.

  8. Bxg:

    No, I don't think my students come away with a really a useful ongoing understanding what "accepting" or "rejecting" a hypothesis. I just don't know how to teach this material any better!

    I do have some alternatives:

    BDA: We do it all from first principles, using calculus. That works.

    ARM: No calculus, we do it all using computing and simulation. But I don't know if I could teach an intro course like this.

    I just don't know. I don't think the current standard intro course works at all (even in its best incarnations, such as in Dick De Veaux's book). But I'm still struggling with how best to replace it. Maybe it would be fine to just teach an intro course out of a simplified version of parts 1 and 2 of ARM, but I didn't think of doing it that way. Maybe that would be be the best way to go, but I can't turn around on this right now. I'll think about it after the current draft of the book is done.

    P.S. One of the pleasures of (a) being tenured and (b) writing for a blog (rather than a journal) is that I can directly say what I'm thinking, including all uncertainties. When you write for a journal it seems that you have to suppress a lot of honesty in order for your paper to get accepted. This is a bad thing for a statistician!

  9. Andrew, John:
    I learned a lot from that book and still occasionally use it as a reference for theory.

    I'm glad that my program started with the simpler stuff to get us going and then moved on to tomes like ARM and Elements of Statistical Learning which are my primary applied references.

  10. That, don't tell someone a story – get them to tell it to themselves stuff.

    Estimates, errors and the weighted average stuff (general least squares) likely is something people sort of have or can pick up on the easiest. The whole Cochrane Collaboration runs on it.

    Now, I was fairly negative on SG’s early drafts on “how conventional frequentist methods can be used to generate Bayesian analyses” (summary comment repeated below) but probably underestimated the value of a simple (though possibly misleading) view of things in terms of things people already sort of grasp.

    But I do agree with Manoel y ~ N(a + bx, 1) is simpler and on the log scale defines quadratic functions that underlie general least squares. But a bridge is needed. Have been thinking of making a tutorial that maps general least squares methods to parabolas and their addition and paths over these (using high school algebra.)

    One of my best examples of statisticians not understanding statistics and in particular the central limit theorem is when they strongly complain about skewed raw data outcomes in a meta-analysis of multiple studies when using study means. The distribution of the study means may need to not be highly skewed but with reasonable sample sizes raw data that are very skewed will still give fairly symmetric study means. Of course this is one of the reasons why general least squares methods work well in practice.

    K

    2005 comment to SG

    But if you allow more general non-linear modeling with optimization the following toy code shows “everything”. Code was stolen from Charles Geyer but is very similar to what I used in my ASA meta-analysis courses and what Venables and Ripley promote for more flexible implementations of generalized linear models in chapter 16 of MASS. Maybe a bit too hard for epidemiologists, certainly not as safe as iteratively re-weighted least squares – but more importantly I think you have to discuss posterior modes versus posterior means and posterior credible intervals that don’t come from posterior mode +/- 1.96 * posterior SD (even small amounts of non-quadratic components in the posteriors will create distracting difference between Bugs output and your simple approach. This may even be a big problem for the nuisance parameters where the usual Bayes approach is to integrate them out of the posterior where as classical and “usual software” Bayes will maximize them out (i.e. a numerical Laplace approximation replacing an expectation with a maximum)

  11. Did he know he convinced you? If so, he really ought to have mentioned it during his presentation. I was left with the impression that he just canvassed you on the subject.

  12. Maybe I forgot the first step?

    I had suggested 3 steps as an alternative to weighted average presentations of Bayes – and have had opportunities to try out but always skipped step 1.

    Maybe I shouldn't have.

    Now, I strongly agree with Andrew on "I don't think the current standard intro course works at all" which is why I keep put these ideas out.

    K?

    3 steps from 2005

    1. For a large sample of gender and heights define a JOINT empirical probability distribution and explore it a bit through various calculations. Now explore conditional and marginal probability calculations – given some one is male what is the posterior probability of height etc. Now add weight so one so one can go through joint and marginal conditional probabilities.

    2. Now do the same for joint probability of disease presence and diagnostic test result(s) with a large sample. Now go to a sample of selected disease present absent case so that probability of disease presence can not be estimated – add borrowed/subjective priors – talk a bit about how one would determine the credibility of those priors.

    3. Now do a Bayesian analysis of a fairly small RCT 2 group binary outcomes with subjective prior (maybe also optimistic and pessimistic priors as sensitivity analysis) – but ALL calculations would be done by drawing from the joint distribution of (Pc,Pt,2by2 tables) and then conditioning on the outcomes observed in the trial (quite doable for small 2by2s table with about 10,000,000 draws from joint). The reason for the awkward simple monte-carlo calculation is to keep everything in 3 exactly the same as in 1.

  13. A colleague just passed on this New Yorker article to me "The Truth Wears Off"

    For lack of a better forum for asking, I will ask here: has the "decline effect" talked about in this article been noted anywhere in the Orthodox Statistics vs Bayesian debate?

  14. AG: "The old methods were pretty simple. We can do a lot more now."

    True, but I recall my UG control theory prof, J. Boyd Pearson, being asked when we'd learn about the Z transform. His response(heavily paraphrased; I think he was more concise): "It's just the Laplace transform with z replaced by exp(st). Just solve for s, plug it in, re-derive all the results in terms of z, and you'll have it." You don't need thick books if your explanations are that concise (and your "left as an exercise for the reader" is that comprehensive). :-)

  15. Allan E: a bunch of stuff is going on.

    Regression to the mean (_notable_ discoveries are unlikley to be underestimates), non-independent replications, low power studies, incomplete reporting of replication attempts, earlier (as SG called it) confirmatory bias (a rush to publish seeming similar results or even data dredging to get those similar results) …

    Doing good science with real replication is hard and expensive and very error prone.

    CS Peirce did point out that you can never stop doing RCTs because the world evolves and changes – but in the direction opposite of what we intially perceived does seem _silly_ (not with standing problems above)

    K?

  16. I agree with Bill cause I probably followed the same cursus as you. I remember too J. Boyd Pearson saying "It's just the Laplace transform with z replaced by exp(st). Just solve for s, plug it in, re-derive all the results in terms of z, and you'll have it." Better to have it two times than one!

  17. I learned a lot from that book and still occasionally use it as a reference for theory.

    I'm glad that my program started with the simpler stuff to get us going and then moved on to tomes like ARM and Elements of Statistical Learning which are my primary applied references

Comments are closed.