We’re not even halfway through with January, but the new year’s already rung in a new book with lots of Stan content:

- Richard McElreath (2016)
*Statistical Rethinking: A Bayesian Course with Examples in R and Stan*. Chapman & Hall/CRC Press.

This one got a thumbs up from the Stan team members who’ve read it, and Rasmus Bååth has called it “a pedagogical masterpiece.”

The book’s web site has two sample chapters, video tutorials, and the code.

The book is based on McElreath’s R package `rethinking`

, which is available from GitHub with a nice README on the landing page.

If the cover looks familiar, that’s because it’s in the same series as Gelman et al.’s *Bayesian Data Analysis*.

Also, class lectures are available here: https://www.youtube.com/playlist?list=PLDcUM9US4XdMdZOhJWJJD4mDBMnbTWw_z

Man (and woman), it is a really sweet book, I swear. I haven’t actually touched the physical book yet, but I’ve been reading drafts of it for the last two years. While it is a very hands on and “pragmatic” book, one thing I particularly like about it is that it does not shy away from discussing the philosophical basis of Bayesian data analysis (“philosophical basis” might sound a bit fuzzy here, but it’s very clear in the book!). That that basis is Jaynes’ “Probability theory” doesn’t make it worse :)

Well he’s not shy about basing his work on Jaynes. There is one criticism though. Why not just refer to frequency distributions as “frequency distributions”, denote them with f(), and admit frankly they’re empirical quantities we’re tying to predict, no different in principle than a meteorologist predicting temperatures, or political scientists predicting vote totals. Reserve probabilities p() solely for modeling and determining the consequences of uncertainties.

I think adopting such notation will be the tipping point for Bayesian statistics, because 90% of the endless sad pit of confusion and despair that is present day statistics just melts away if you simply don’t use the same notation for frequencies as you do for probabilities.

Book looks great but not too impressed by the typo on the first page!! (Preface, second paragraph, page xi)

But that isn’t important – it looks really useful.

I used McElreath’s book (in draft form) last Spring and am using it again this Spring for my Bayesian Statistics for the Social Sciences class that I teach in the Quantitative Methods in the Social Sciences M.A. program at Columbia. It is a great book.

Hi,

Was really looking forward to reading this on my flight but unfortunately the kindle version comes with corrupted font. Tested it across multiple devices: android, pc, ipad.

I wish someone would figure out a way to make more academic books readable on the Kindle.

I am reading the McElreath book on the VitalSource bookshelf. This is quite an improvement over Kindle, although it still has some annoying aspects.

In general CRC Press is doing a better job than many other stats publishers by releasing books on Kindle that look exactly like the print version; Springer has outperformed CRC Press (recently?) by allowing people to just buy the pdf and read it like a regular pdf file.

Another surprise was the cost of the McElreath book on Kindle; even BDA3 is 10 Euros cheaper. CRC Press should reduce the online books’ prices. If I assign the McElreath book to students here in Potsdam, many will not be able to afford it.

Yes, I’m hearing a lot about the corrupted Kindle version. I’ve let CRC Press know, but I don’t think they actual produce the Kindle edition, so not sure how many subcontractor steps until it is corrected.

Jaynesians,

At the moment, while I don’t think I “buy into” the Maximum Entropy Principle, I do think it is interesting.

But I was watching some of McElreath’s video lectures and something struck me as odd. It sounded like he was saying that if you want a prior with support on the real line and a finite variance, then the MaxEnt prior will be the normal distribution. This isn’t accurate, right?

I’m no MaxEnt expert, but it seems to me that the more precise statement would be that if you want a prior with support on the real line and the only other thing you know is that the variance is a particular number then, the normal with this variance is the MaxEnt distribution.

But my real question is, how often could that situation come up, really? I mean if you think you can specify the variance, then how hard would it be to also elicit a mean absolute deviation? What if I started with specifying this MAD and got a bit lazy and didn’t say anything about the variance?

Is there any development of software out there that helps you determine MaxEnt distributions for cases where you have more than one moment constraints or more complicated constraints?

JD

JD: You’ll likely enjoy the derivations in Chapter 9.

If there is a finite variance, then there is also a mean. You get that moment by implication, which is why it isn’t listed as a constraint. If you assume mean absolute deviation but say nothing about variance, the maxent dist is exponential.

I mainly use maxent in the course to derive likelihoods (aka data priors), not parameter priors. So that’s why I don’t focus on fixed distributions, but rather conditional distributions. Hopefully that makes the issue clearer.

In the general case, you can specify nearly anything about the distribution and there might still be a maximum entropy distribution that satisfies that constraint (there doesn’t always have to be). The general case for specifying known values for various moments has been more or less worked out. But you could specify other things: the pdf has peaks at 0 and 1, the mean value is 2, the 95th percentile is 5 and q(x) has interquartile range 1 to 4.5 for some given strange nonlinear function q or whatever

getting the maximum entropy distribution for a sufficiently weird set of constraints like that might require numerical approximations or something similar, like writing the log density in a basis expansion and solving numerically for the coefficients.

In many cases, even if that’s the true set of information you have, you could work with a simpler problem (ie. just the peaks at 0,1 and the mean value 2 might be enough to get useful results with). In some sense the reason the normal distribution is so useful and common is that it’s one of the “simplest” maximum entropy distributions (ie. contains very little information) especially if you are hierarchically modeling the value of the standard deviation.

But, yes you can specify a mean absolute deviation and get a laplace type distribution too. That turns out to be the Bayesian interpretation of the “LASSO” I guess. I often use exponential distributions for priors over parameters that I know the approximate order of magnitude of (ie. “on this scale it’s a positive number about 3”, so exponential(1/3.0) is the max-ent prior)

This looks great! Just ordered it. thx.

5 Star rating.

It took me a while to get a chance to sample the online lectures – excellent, full of sensible insight put in ways (metaphors) most likely to cause (some arguably useful) understanding by non-statistical grad students.

And no overdone frequency approach bashing!

I just bought this book; so far I have found it an interesting and useful reading. On the topic of regression, on several chapters McElreath uses height as a dependent variable and weight as a predictor. As a former teacher of statistics and biostatistics, I consider that these variables should be interchanged. Most people are concerned by their weight for a God (or Nature) given height. In biostatistics, body mass index (bmi) is often discussed as a measure overweight and obesity, for example (bmi = weight (kg) / [height (m)]2) . And medical doctors consult index table of weight for given height of their patients.

Because of its Bayesian approach, I consider McElreath’s book a must read for statisticians. Paraphrasing DV Lindley: Bayesian Statistics is the 21 Century Statistics.

Great book. However, for some reason my version of the book is missing a large chunk of Chapter 6. I contacted CRC but they end up giving me an ebook code for VitalSource. Not exactly what I was hoping for, and now I cannot share with the book with my students. A printed copy is much more pleasant to read than anything onscreen.

This book is a gem!

I am a systematic portfolio manager trading in the futures markets and this work has not only generated a lot of new ideas but has me questioning prior work at a fundamental level.

Can i use this book to use bayesian statistics for a typical RCT?

Thank you!

anoop