A month ago I (Aki) started a series of tweets about “scientific books which have had big influence on me…”. They are partially in time order, but I can’t remember the exact order. I may have forgotten some, and some stretched the original idea, but I can recommend all of them.

I have collected all those book tweets below and fixed only some typos. These are my personal favorites, and there are certainly many great books I haven’t listed. Please, tell your own favorite books and short description why you like those books in the comments.

I start to tweet about scientific books which have had big influence on me…

- Bishop, Neural Networks for Pattern Recognition, 1995. The first book where I read about Bayes. I learned a lot about probabilities, inference, model complexity, GLMs, NNs, gradients, Hessian, chain rule, optimization, integration, etc. I used it a lot for many years.
Looking again at contents, it is still a great book although naturally some parts are bit outdated.

- Bishop (1995) referred to Neal, Bayesian Learning for Neural Networks, 1996, from which I learned about sampling in high dimensions, HMC, prior predictive analysis, evaluation of methods and models. Neal’s free FBM code made it easy to test everything in practice.
- Jaynes, Probability Theory: The Logic of Science, 1996: I read this because it was freely available online. There is not much for practical work, but plenty of argumentation why using Bayesian inference makes sense, which I did find useful when I was just learning B.
15 years later I participated in a reading circle with mathematicians and statisticians going through the book in detail. The book was still interesting, but not that spectacular anymore. The discussion in the reading circle was worth it.

- Gilks, Richardson & Spiegelhalter (eds), Markov Chain Monte Carlo in Practice (1996). Very useful introductions to different MCMC topics by Gilks, Richardson & Spiegelhalter Ch1, Roberts Ch3, Tierney Ch4, Gilks Ch5, Gilks & Roberts Ch6, Raftery & Lewis Ch7.
And with special mentions to Gelman on monitoring convergence Ch8, Gelfand on importance-sampling leave-one-out cross-validation Ch9, and Gelman & Meng on posterior predictive checking Ch11. My copy is worn out from heavy use.

- Gelman, Carlin, Stern, and Rubin (1995). I just loved the writing style, and it had so many insights and plenty of useful material. During my doctoral studies I also made about 90% of the exercises as self-study.
I considered using the first edition when I started teaching Bayesian data analysis, but I thought it was maybe too much for a introduction course, and it didn’t have model assessment and selection, which is important for me.

This book (and its later editions) is the one I have re-read most, and when re-reading I keep finding things I didn’t remember being there (I guess I have a bad memory). I still use the last edition regularly, and I’ll get later back to these later editions.

- Bernardo and Smith, Bayesian Theory, 1994. Great coverage (although not complete) of foundations and axioms of Bayesian theory with emphasize that actions and utilities are inseparable part of the theory.
They admit problems of theory in continuous space (which seem to not have a solution that would please everyone, even if it works in practice) and review general probability theory. They derive basic models from simple exchangeability and invariance assumptions.

They review utility and discrepancy based model comparison and rejection with definitions of M-open, -complete, and -closed. This and Bernardo’s many papers had strong influence how I think about model assessment and selection (see, e.g. http://dx.doi.org/10.1214/12-SS102).

- Box and Tiao, Bayesian Inference in Statistical Analysis, 1973. Wonderful book, if you want to see how difficult inference was before MCMC and prob. programming. Includes some useful models, and we used one of them as a prior in a neuromagnetic inverse problem http://becs.aalto.fi/en/research/bayes/brain/lpnorm.pdf
- Jeffreys, Theory of Probability, 3rd ed, 1961. Another book with historical interest. The intro and estimation part are sensible. I was very surprised to learn that he wrote about all the problems of Bayes factor, which was not evident from the later literature on BF.
- Jensen, A introduction to Bayesian Networks, 1996. I’m travelling to Denmark, which reminded me about this nice book on Bayesian networks. It’s out of print, but Jensen & Nielsen, Bayesian Networks and Decision Graphs, 2007, seems to be a good substitute.
- Dale, A History of Inverse Probability: From Thomas Bayes to Karl Pearson, 1991. Back to historically interesting books. Dale has done lot of great research on history of statistics. This one helps to understand Bayesian-Frequentist conflict in 20th century.
The conflict can be seen, eg, Lindley writing in 1968: “The approach throughout is Bayesian: there is no discussion of this point, I merely ask the non-Bayesian reader to examine the results and consider whether they provide sensible and practical answers”.

McGrayne, The Theory That Would Not Die: How Bayes’ Rule Cracked The Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy, 2011 is more recent and entertaining, but based also on much of Dale’s research.

- Laplace, Philosophical Essay on Probabilities, 1825. English translation with notes by Dale, 1995. Excellent book. I enjoyed how Laplace justified the models and priors he used. Considering clarity of the book, it’s strange how little these ideas were used before 20th century
- Press & Tanur, The Subjectivity of Scientists and the Bayesian Approach, 2001. Many interesting and fun stories about progress of science by scientists being very subjective. Argues that Bayesian approach at least tries to be more explicit on assumptions.
- Spirer, Spirer & Jaffe, Misused Statistics, 1998. Examples of common misuses of statistics (deliberate or inadvertent) in graphs, methodology, data collection, interpretation, etc. Great and fun (or scary) way to teach common pitfalls and how to do things better.
- Howson & Urbach, Scientific Reasoning: The Bayesian Approach, 2nd ed, 1999. Nice book on Bayesianism and philosophy of science: induction, confirmation, falsificationism, axioms, Popper, Lakatos, Kuhn, Cox, Good, and contrast to Fisherian & Neyman-Pearson significance tests.
There are also 1st ed 1993 and 3rd ed 2005.

- Gentle, Random Number Generation and Monte Carlo Methods, 1998, 2.ed 2003. Great if you want to understand or implement: pseudo rng’s, checking quality, quasirandom, transformations from uniform, methods for specific distributions, permutations, dependent samples & sequences.
- Sivia, Data Analysis. A Bayesian tutorial, 1996. I started teaching a Bayesian analysis course in 2002 using this thin very Jaynesian book, as it had many good things. Afterward I realized that it missed too much from the workflow, so that students could do their own projects
- Gelman, Carlin, Stern, & Rubin, BDA2, 2003. This hit the spot. Improved model checking, new model comparison, more on MCMC, and new decision analysis made it at that time the best book for the whole workflow. I started using it in my teaching the same year it was published.
Of course it still had some problems, like using DIC instead of cross-validation, effective sample size estimate without autocorrelation analysis, etc., but additional material I needed to introduce in my course was minimal compared what any other book would had required.

My course included the chapters 1-11 and 22 (with varying emphasis), and I recommended for students to read other chapters.

- MacKay, Information Theory, Inference, and Learning Algorithms, 2003. Super clear introduction to information theory and codes. Has also excellent chapters on probabilities, Monte Carlo, Laplace approximation, inference methods, Bayes, and ends up with neural nets and GPs.
The book is missing the workflow part, but it has many great insights clearly explained. For example, in Monte Carlo chapter, I love how MacKay tells when the algorithms fail and what happens in high dimensions.

Before the 2003 version, I had been reading also drafts which had been available since 1997.

- O’Hagan and Forster, Bayesian Inference, 2nd ed, vol 2B of Kendall’s Advanced Theory of Statistics, 2004. A great reference on all the important concepts in Bayesian inference. Fits well between BDA and Bayesian Theory, and one of my all of favorite books on Bayes.
Covers, e.g., inference, utilities, decisions, value of information, estimation, likelihood principle, sufficiency, ancillarity, nuisance, non-identifiability, asymptotics, Lindley’s paradox, conflicting information, probability as a degree of belief, axiomatic formulation, …

finite additivity, comparability of events, weak prior information, exchangeability, non-subjective theories, specifying probabilities, calibration, elicitation, model comparison (a bit outdated), model criticism, computation (MCMC part is a bit outdated), and some models…

- Rasmussen and Williams, Gaussian Processes for Machine Learning, 2006. I was already familiar with GPs through many articles, but this become very much used handbook and course book for us. The book is exceptional in that it also explains how to implement stable computation.
It has a nice chapter on Laplace approximation and expectation propagation conditional on hyperparameters, but has only Type II MAP estimate for hyperparameters. It has a ML flavor overall, and I know statisticians who have difficulties following the story.

The book was very useful when writing GPstuff. It’s also available free online.

- Gelman & Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, 2006. I was already familiar with the models and methods in the book, but I loved how it focused on how to think about models, modeling and inference, using many examples to illustrate concepts.
The book starts from a simple linear models and has a patience to progress slowly not to go to early to details on computation and it works surprisingly well even if Bayesian inference comes only after 340 pages.

Gaussian linear model, logistic regression, generalized linear models, simulation, model checking, causal inference, multilevel models, Bugs, Bayesian inference, sample size and power calculations, summarizing models, ANOVA, model comparison, missing data.

I recommended the book to my students after BDA2 and O’Hagan & Forster, as it seemed to be a good and quick read for someone who knows how to do the computation already, but I couldn’t see how I would use it in teaching as Bayesian inference comes late and it was based on BUGS!

More recently re-reading the book, I still loved the good bits, but also was shocked to see how much it was encouraging to wander around in a garden of forking paths. AFAIK there is a new edition in progress which updates it to use more modern computation and model comparison.

- Harville, Matrix Algebra From a Statistician’s Perspective, 1997. 600 pages of matrix algebra with focus on that part of matrix algebra commonly used in statistics. Great book for people implementing computational methods for GPs and multivariate linear models.
Nowadays with Matrix cookbook online, I use it less often to check simpler matrix algebra tricks, but my students still find it useful as it goes deeper and has more derivations in many topics.

- Gelman and Nolan, Teaching Statistics: A Bag of Tricks, 2002 (2.ed 2017). A large number of examples, in-class activities, and projects to be used in teaching concepts in intro stats course. I’ve used ideas from different parts and especially from decision analysis part.
- Abrams, Spiegelhalter & Myles, Bayesian Approaches to Clinical Trials and Health-Care Evaluation, 2004. This was helpful book to learn basic statistical issues in clinical trials and health-care evaluation, and how to replace “classic” methods with Bayesian.
Medical trials, sequential analysis, randomised controlled trials, ethics of randomization, sample-size assessment, subset and multi-center analysis, multiple endpoints and treatments, observational studies, meta-analysis, cost-effectiveness, policy-making, regulation, …

- Ibrahim, Chen & Sinha, Bayesian Survival Analysis, 2001. The book goes quickly to the details of model and inference and thus is not an easy one. There has been a lot of progress in models and inference afterwards, but it’s still very valuable reference on survival analysis.
- O’Hagan et al, Uncertain Judgments: Eliciting Experts’ Probabilities, 2006. A great book on very important but too much ignored topic of eliciting prior information. A must read for anyone considering using (weakly) informative priors.
The book reviews psychological research that shows, e.g., how the form of the questions affect the experts’ answers. The book also provides recommendations how to make better elicitation and how to validate the results of elicitation.

Uncertainty & the interpretation of probability, aleatory & epistemic, what is an expert?, elicitation process, the psychology of judgment under uncertainty, biases, anchoring, calibration, representations, debiasing, elicitation, evaluating elicitation, multiple experts, …

- Bishop, Pattern Recognition and Machine Learning, 2006. It’s quite different from 1995 book, although it covers mostly the same models. For me there was not much new to learn, but my students have used it a lot as a reference, and I also enjoyed the explanations of VI and EP.
Based on the contents and the point of view, the name of the book could also be “Probabilistic Machine Learning”

Due to the theme “influence on me”, it happened that all books I listed were published 2006 or earlier. After that I’ve seen great books, but those have had less influence on me. I may later make a longer list of more recent books I can recommend, but here are some as a bonus:

- McGrayne, The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy, 2012. Entertaining book about history of Bayes theory.
- Gelman, Carlin, Stern, Dunson, Vehtari & Rubin, Bayesian Data Analysis, 3rd ed, 2013. Obviously a great update of the classic book.
- Särkkä, Bayesian Filtering and Smoothing, 2013. A concise introduction to non-linear Kalman filtering and smoothing, particle filtering and smoothing, and to the related parameter estimation methods from the Bayesian point of view.
- McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan, 2015. Easier than BDA3 and well written. I don’t like how the model comparison is presented, but after reading this book, just check my related articles which were mostly published after this book.
- Goodfellow, Bengio & Sourville, Deep Learning, 2016. This deep learning introduction has enough probabilistic view that I also can recommend it.
- Stan Development Team, Stan Modeling Language: User’s Guide and Reference Manual, 2017. It’s not just Stan language manual, it’s also full of well written text about Bayesian inference and models. There is a plan to divide this in parts, and one part would make a great text book.

I’ve read more than these and the list was just the ones I enjoyed most. I think people, and also I, read less books now when it’s easier to find articles, case studies, and blog posts in internet. Someday I’ll make similar list for the top papers I’ve enjoyed.

This is a great list to have online.

The books that have had an influence on me personally are more commonly used in the signal processing/communication theory higher education. Some of them are not in the list:

– Popoulis, “Probability, Random Variables and Stochastic Processes,” : Provides an intuitive yet compactly rigorous introduction to the theory of probability, statistical inference and (discrete time) stochastic processes. When at graduate school, I remember going through the chapters of this book several times.

– Cover & Thomas, “Elements of Information Theory”: Again a graduate textbook which has a wide coverage of information theory and related topics.

One book which shaped my research vision later on was

– R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks

and Expert Systems. Springer, 1999 : A very good book on probabilistic graphical models and inference algorithms, particularly for developing a fundamental understanding of Markovian relations on general graphs.

I don’t remember reading Popoulis, but now that you mentioned I remember reading bits of Cover & Thomas, and all of Cowell et al. (reason I had forgotten them is not because they are not good, it’s just that I haven’t used them much, and I have a limited memory).

I think the same. I’ve seen too much wrong, out-of-date, or not-even-wrong stuff stated with authority in textbooks. I say that as someone who actually did read them cover to cover for classes at one point.

The most recent thing I remember coming across was a neuroscience book which claimed something like “there are x neurons in the human brain and y connections between them” without offering any citation. How was it measured, do other methods of estimation give similar values, etc? All the actual scientific content was missing, replaced by authoritative claims I was supposed to memorize. At some point I just stopped bothering with textbooks altogether due to anti-scientific stuff like that. And of course, all here know about how NHST was created purely by stats 101 textbook authors who got confused between N-P and Fisher.

Some textbooks do a good job of listing the citations. Others…..do not. And it boggles my mind.

Interesting list.

My priority has been to understand what these books suggest. And if these resources haven’t yet settled some of the intellectual challenges we face, then we need to discuss their strengths and drawbacks. Much writing is so poor that it becomes laborious to distill the argument without its. excruciating mapping.

MacKay – Me too. Best book ever. Well not quite, but good enough to keep on the bedside (the man’s enlarged my mind – as Dennis Hopper said in other circumstances).

Beyond that, there are other books on the list that I would also agree to, but I won’t agree in detail.

Books not on the list, a bit further from statistics, but relevant for building complex mathematical software

Computational math

Strang – Computational Science

Programming (David MacKay had many strengths – programming wasn’t among them that I could see).

Gries – Science of programming

Structure and interpretation (of course)

Lots of other things, but certainly MacKay, SICP, Gries and Strang are books I would run into a burning house to save

MacKay’s book has exceptionally clear MATLAB/Octave code for algorithms like HMC.

Yesterday I discovered the amazing Sharon Bertsch Mcgrayne, and the story she tells about the appalling but of course typical antagonism to Bayesianism, and the many and significant vital products based on Bayes. I was staggered. I’ve not read her book “yet”:

“The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from…” https://www.amazon.co.uk/Theory-That-Would-Not-Die/dp/0300188226

…but I’ve seen her Youtube video:

https://www.youtube.com/watch?v=2o-_BGqYM5U

I didn’t realise it wasn’t just ignorance but stubbornness that blocked Bayes for so long. Interestingly it is mirrored by and overlaps with the antagonism by biological geneticists to investigation by computer simulation. I shall be writing about the big exception: MCMC, but I didn’t realise even this was given such a hard time. I can only guess how much damage the persecution of Bayesianists has done to science/stats/civilisation in general, but I’ve discovered for myself that just anti-computer-simulation has crippled our understanding of the Darwinian nechanism by an unnecessary 40 years.

Even from her video we can see that one of the great heroes of computationa science, Keith Hastings (who I believe invented the name Monte Carlo) pretty well had his career terminated for his efforts.

This putting of enormous effort into opposing the best discoveries seems unfortunately to go beyond “recurring theme” status; it seems to be one of the main activities of scientists. We are now familiar with Fisher’s claims that smoking does not cause cancer, but he also claimed cancer causes smoking, while doing as much as anyone else to block Bayes. Haldane’s 1957 paper, currently adhered to in pure form by huge numbers of geneticists, has done even more to block our understanding of genetics than Fisher did to block stats. It’s important that everyone entering science and maths is told the truth: that the bigest problem in science is scientists.

Strangetruther:

Regarding unreasonable and ignorant resistance to Bayesian methods, see this article with Christian Robert and our rejoinder to discussion.

P.S. According to Wikipedia, Nicholas Metropolis came up with the term “Monte Carlo method” in the 1940s.

From your first paper:

But you don’t provide any source for this. Do you have any quotes in mind from Fisher that make you think he is actually “anti-Bayesian” vs some distinction like “anti-uninformed priors”, etc?

Thanks for the two papers – I expect they will be very useful!

Yes, yes, indeed, indeed, thanks, it was Metropolis. I realised I’d got that wrong a few hours after writing it! It couldn’t be from the later Hastings work because “Monte Carlo” was a deliberately uninformative code word, in the style of the early cold war and presumably influenced in the wake of WWII. By 1970 they wouldn’t have bothered. In fact originally I’d thought Metropolis and Hastings were code words directly influenced by “Manhatten” and “Monte Carlo” :-) .

It’s been awhile since I read him closely but Fisher never really seemed anti-Bayesian to me. I remembered this quote, maybe he changed his mind when he got older?

Fisher, R N (1958). “The Nature of Probability”. Centennial Review. 2: 261–274. http://www.york.ac.uk/depts/maths/histstat/fisher272.pdf

IIRC, he was much more anti-Neyman-Pearson (power, alternative hypothesis, accepting hypotheses, etc) than anti-Bayes. Its more the NHST-advocates who are defending the indefensible and pretty much know nothing of how their methods were developed that are anti-Bayes.

Fisher does not seem to have been an NHST-advocate or user, in fact (unlike Student and Neyman) I have always seen him be very careful not to draw conclusions beyond the precise null hypothesis being tested. In fact, once again IIRC, the smoking issue you bring up was precisely him pointing out alternative reasons for rejecting the null hypothesis that cancer rates were equal amongst smokers and non-smokers. Also, I’d be careful with allowing any medical claim at all to have the status as “fact” in your worldview…

I’d love to discuss more if you have quotes/references that disagree. This is mostly from memory of back when I first discovered the true nature of NHST.

> Fisher that make you think he is actually “anti-Bayesian” vs some distinction like “anti-uninformed priors”, etc?

Interesting, I think that might apply to CS Peirce, but I have not done the scholarship necessary to make such a claim.

Fisher like almost anyone else, would use Bayes if they had what Don Fraser has coined as genuine prior knowledge – where the variation in parameters is aleatory rather than epistemic. Having a physical basis for the variation (cancer baseline rates).

For instance, Andrew and I quote Reid and Cox – “It seems to be already widely supported for probability generating models for data “[providing an] explicit description in idealized form of the physical, biological, . . . data generating process,” that is essentially “to hypothesize a data generating mechanism that produces observations as if from some physical probabilistic mechanisms”

While in contrast we claimed that “We believe scientific research would be more effective if statistics was viewed instead as primarily about conjecturing, assessing, and adopting idealized representations of reality, predominantly using probability generating models for both parameters and data …”

Now, David later by email disagreed strongly with value of ever using a probability model for for quantifying their [epistemic] uncertainty “is often (nearly always?) a bad idea”. I do believe that he is summarizing Fisher’s view here.

If you read the paper linked by Carlos Ungil and follow the refs to see what Fisher wrote it is pretty clear he just didn’t like assuming uniform distributions for the prior.

(1921). “On the `Probable Error’ of a Coefficient of Correlation Deduced from a Small Sample.” Metron, 1: 3-32.

(1925a). Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd, 1 edition.

https://projecteuclid.org/download/pdf_1/euclid.ba/1340370565

Abstract. Ronald Fisher believed that “The theory of inverse probability is founded upon an error, and must be wholly rejected.” This note describes how Fisher divided responsibility for the error between Bayes and Laplace. Bayes he admired for formulating the problem, producing a solution and then withholding it; Laplace he blamed for promulgating the theory and for distorting the concept of probability to accommodate the theory. At the end of his life Fisher added a refinement: in the Essay Bayes had anticipated one of Fisher’s own fiducial arguments.

Yea, that agrees with my memory. His problem wasn’t really with Bayes’ theorem, it was with assuming uniform priors without any justification.

He also had a problem with non-frequentist interpretations of probability.

Any quotes? In my mind Fisher is labeled an “inductivist” which is distinct from a “frequentist”. Definitely he disagreed with the many aspects of frequentist statistical practice I mentioned.

There are a couple of quotes on that paper:

(from a 1922 review of Keynes’ Treatise on Probability)

“To the statistician probability appears simply as the ratio which a part bears to the whole of a (usually infinite) population of possibilities. Mr. Keynes adopts a psychological definition. It measures the degree of rational belief to which a proposition is entitled in the light of given evidence.”

(from the discussion on a 1935 paper)

“following Bayes, and, I believe, most of the early writers, but unlike Laplace, and others influenced by him in the nineteenth century, I mean by mathematical probability only that objective quality of the individual which corresponds to frequency in the population, of which the individual is spoken of as a typical member.”

The later is available here: https://www.scribd.com/document/329753413/The-Logic-of-Inductive-Inference

Another quote from it:

“Although some uncertain inferences can be rigourously expressed in terms of mathematical probability, it does not follow that mathematical probability is an adequate concept for the rigorous expression of uncertain inferences of every kind. (..) If it appears in inductive reasoning, as it has appeared in some cases, we shall wellcome it as a familiar friend. More generally, however, a mathematical quantity of a different kind, which I have termed mathematical likelihood, appears to take its place as a measure of rational belief when we are reasoning from the sample to the population.”

Thanks Anoneuoid – just from these comments a possibiity is emerging (for me) that Fisher may well have changed over the years. WWII seems to have been a huge boost for Bayes – but only amongst those who could get past the enormous secrecy effort casued by its huge usefulness in the war. I presume Fisher had the contacts to get past it.

Also, his dislike of priors seems very old fashioned to those of us brought up with iterative techniques. Things were different in the pre-computing days; today I for one tend to see “priors”, to the extent that I understand them which isn’t far, as just being the initial values you plop in because you’ve got to start somewhere. Lucky initialisation might speed things up but their details won’t matter much in the end. The deep influence of computers can easily be underestimated. Just two good old boys from my U/G days are a good example. J Maynard Smith said “any idiot can wite a computer program” – and he meant it!, and dear Margaret Boden is I think uninfluenced by some crucial subtleties of experience that writing a rattling good cognition program or two could lend. (Not certain about the last one, but strong suspicions. Hope she doesn’t read this!)

“I’d love to discuss more if you have quotes/references that disagree.” Thanks Anon! I may collect some more info but I don’t expect to become worthy of an expert’s extended time on the topic of Fisher! – though I will take up your offer if I can get away with it! Mind you, I never expected to get where I’ve got in the last few months. It was only last month that I discovered Sewell / Wright were two names of the same person, not two people! Even if I can pose as a genetics expert it would be years before I could get to grips properly with stats, and these days I ought to be planning and assigning my remaining years carefully :-S

“I presume Fisher had the contacts to get past it”

I’ve a feeling Sharon B M said in her Youtube talk that Fisher had had discussions with Turing – who used Bayes in his decryption work.

Ok, but all I was wondering is what made you say this:

You shouldn’t need any additional info collection. Is it just an impression you got from somewhere or what?

From Sharon B M’s Youtube talk.

Assuming “science” is a noun and that it is derived from “scientia” my two recent-ish favorites are:

“Knowledge and its Limits”, by Timothy Williamson

and

“Causality”, by Judea Pearl

I have used Gerd Gigerenzer’s book “Calculated Risks” successfully in a Freshman/Sophomore honors college course I taught at two universities on making good decisions. This course was designed for general students, not for math majors or statisticians, although several people who took the course have gone into statistics. I like this book because it has very intuitive ideas that can readily be used by people who go into fields where decision-making and explaining risks is important. I have had several former student tell me later that this course was useful to them when they got to medical school, for example, a field where people have to learn how to make life-and-death decisions frequently.

One of the innovative features of Gigerenzer’s book is his use of what he calls “natural frequencies” to explain the basic ideas of Bayesian inference without using Bayes’ theorem, in a way that appeals to lay people and which can easily be explained to people even without writing anything down, in simple cases. See here:

https://en.wikipedia.org/wiki/Gerd_Gigerenzer#Risk_communication

I have often run into people who hated their experience in a statistics class (often on airplanes where you have a seat-companion who asks you what you do, and when they learn you are a statistician, say how much they hated the course they took). I have used that opportunity many times to give simple examples, just in words, of how different the Bayesian approach is. For example, I can say (following an example in Gigerenzer’s book, so these numbers may have changed) that a mammogram, in the usual population tested, is 90% accurate: if the subject has breast cancer, the test will be positive 90% of the time, and if not, it will be negative 90% of the time. Then I ask (cheating of course) “What’s the probability that the subject has breast cancer?” “90%” is the usual reply (which, according to Gigerenzer, is what a large fraction of oncologists surveyed also answer). But then I note the prior: In that population, only 1% have cancer. So, out of a group of 1000 women (this is using Gigerenzer’s “natural frequencies” method) 10 will have cancer, of which 9 will be positive. Of the remaining 990 women who do not have cancer, 99 will be positive (false positive). So 9/(9+99) or about 8% actually have the disease. No mention of Bayes’ theorem. No writing stuff down, and it’s intuitively clear what’s happening. Most of the people I’ve had this discussion with are quite interested.

The book does not discuss loss functions, which I covered in my course using other reading materials. He may have discussed this in his more recent (2014) popular book, “Risk Savvy: How to Make Good Decisions”, but I haven’t read it.

Thanks for that recommendation: Calculated Risks

Gigerenzer has done a lot of good stuff — but I did once catch one place where he botched it — see http://www.ma.utexas.edu/blogs/mks/2015/02/03/another-mixed-bag-gigerenzers-mindless-statistics/

> No writing stuff down, and it’s intuitively clear what’s happening.

Yes and no.

If you can walk someone through Galton’s two stage quincunx it will likely be true that it’s very clear to them what is happening.

But that does not mean that they will or even can connect the underlying logic (how to profitably learn from observations) to anything else – especially pragmatic Bayesian analyses with modeling justifications, criticism, predictive checking and adequate processing of posterior quantities. A point first made to me by David Spiegelhalter.

For instance, just providing a sense of a reality beyond direct access (always the case in statistics instead of math) that has to be represented (as a model to do any analysis) creates a lot of confusion – https://galtonbayesianmachine.shinyapps.io/GaltonBayesianMachine/

I don’t think anyone has fully worked through a full undergrad non-math course starting with just Bayes, sticking with it and introducing sampling distributions as simply an aspect of prior/posterior predictive quantities. Recently read what Tim Hesterberg (The Amer Stat 2015) argues needs to be covered to safely teach the bootstrap in undergrad and I suspect it won’t be as hard for many.

Forget to mention Martha (Smith) link raises, I think similar concerns – e.g. acquiring a good facility with statistical thinking.

As I mentioned to John Ioannidis at recent Yale Reproducibility conference, we have to cultivate better quantitative skills at undergraduate level. Obviously if academics themselves make mistakes in statistical reasoning, as Rex Kline has pointed out, something is amiss.

Correction: Should have written:

Then I ask (cheating of course) “Suppose the test is positive? What’s the probability that the subject has breast cancer?”