Robert and Casella’s book on Monte Carlo Methods with R

I remember many years ago being told that political ideologies fall not along a line but on a circle: if you go far enough to the extremes, left-wing communists and right-wing fascists end up looking pretty similar.

I was reminded of this idea when reading Christian Robert and George Casella’s fun new book, “Introducing Monte Carlo Methods with R.”

I do most of my work in statistical methodology and applied statistics, but sometimes I back up my methodology with theory or I have to develop computational tools for my applications. I tend to think of this sort of ordering:

Probability theory – Theoretical statistics – Statistical methodology – Applications – Computation

Seeing this book, in which two mathematical theorists write all about computation, makes me want to loop this line in a circle. I knew this already–my own single true published theorem is about computation, after all–but I tend to forget. In some way, I think that computation–more generally, numerical analysis–has taken some of the place in academic statistics that was formerly occupied by theorem-proving. I think it’s great that many of our more mathematical-minded probabilists and statisticians can follow their theoretical physicist colleagues and work on computational methods. I suspect that applied researchers such as myself will get much more use out of theory as applied to computation, as compared to traditionally more prestigious work on asymptotic inference, uniform convergence, mapping the rejection regions of hypothesis tests, M-estimation, three-armed bandits, and the like.

Don’t get me wrong–I’m not saying that computation is the only useful domain for statistical theory, or anything close to that. There are lots of new models to be built and lots of limits to be understood. Just, for example, consider the challenges of using sample data to estimate properties of a network. Lots of good stuff to do all around.

Anyway, back to the book by Robert and Casella. It’s a fun book, partly because they resist the impulse to explain everything or to try to be comprehensive. As a result, reading the book requires the continual solution of little puzzles (as befits a book that introduces its chapters with quotations from detective novels). I’m not sure if this was intended, but it makes it a much more participatory experience, and I think for that reason it would also be an excellent book for a course on statistical computing.

For an example of an ambiguity or puzzle: on pages 127-128, there is an example of optimization of the likelihood and log-likelihood. However, it is never explained why these yield different optima, nor is the code actually given for the graphs that are displayed. Let me emphasize here that I am not stating this as a criticism; rather, Robert and Casella are usefully leaving some steps out for the reader to chew over and fill in.

The good news is that there’s an R package (mcsm) that comes with the book and includes all the code, so the interested reader can always go in there to find what they need.

I noticed a bunch of other examples of this sort, where the narrative just flows by and, as a reader, you have to stop and grab it. Lots of fun.

One other thing: the book is not beautiful. It has an ugly mix of fonts and many of the graphs are flat-out blurry. Numbers are presented to 7 significant figures. Maybe that’s ok, though, in that these displays look closer to what a student would get with raw computer output. The goal of the book is not to demonstrate ideal statistical practice (or even ideal programming practice) but rather to guide the student to a basic level of competence and to give a sense of the many intellectual challenges involved in statistical computing. And that, this book does well. The student can do what’s in the book and then is well situated to move forward from there.

I think the book would benefit from a concluding chapter, or an epilogue or appendix, on good practice in statistical computation. Various choices are made for pedagogical reasons in earlier chapters that could, if uncorrected, leave a wrong impression in readers’ minds. Beyond the aforementioned significant digits and ugly graphs, I’m thinking of choices such as the Langevin algorithm in chapter 6 (which I understand has lots of practical problems and can most effectively be viewed as a special case of hybrid sampling); or the discussion of hierarchical models without the all-important (to me) redundant multiplicative parameterization; or the use of a unimodal distribution to approximate the likelihood function from Cauchy data; or the overemphasis (from my perspective) of importance sampling, which is a great conceptual tool but is close to dominated by Metropolis-Hastings in practice. (As I wrote back in 1991, for some reason people view importance sampling as exact and MCMC as approximate, but importance sampling is not exact at all.)

P.S. To clarify that last point: I’m down on straight importance sampling, but I agree that the ideas of importance sampling, as applied to more complicated algorithms such as particle filtering and sequential Monte Carlo, are important. It’s possible that Casella and Robert made this point in their book and I missed it.

P.P.S. Will Christian and Gareth write a paper together? Robert and Roberts, it’s a natural combination, no?

8 thoughts on “Robert and Casella’s book on Monte Carlo Methods with R

  1. "Robert and Casella are usefully leaving some steps out for the reader to chew over and fill in."

    Are the answers somewhere? I like a good puzzle, but expect the answers to be in the back of the book / on a website, etc.

    On a similar note, I noticed Johannes Ledolter has posted the answers to problems found in Abraham/Ledolter
    Statistical Methods for Forecasting, 1983 http://www.biz.uiowa.edu/mansci/faculty/ledolter….

    This gives a little more life to a book that otherwise would sit on my shelf due to its age. Like many applied people, I find the fastest way to solve problems is by analogy to some problem I've seen/solved before.

  2. Do you really feel that MCMC dominates new versions of importance sampling, like population monte carlo? It seems like Roberts keeps publishing papers where importance sampling dominates MCMC by a large margin. Maybe these are narrow applications. I'd be interested in a blog post on your thoughts on this.

  3. John: You might be right; you're thinking at a more sophisticated level than I was. I wasn't thinking of methods such as particle filtering that use importance weights; I was thinking of simple importance sampling and importance resampling, two methods that I think can be useful conceptually but aren't so great in practice. Recall also that horrible adaptive rejection sampling that seemed so clever at the time but over the years has caused a million problems in Bugs.

  4. The emphasise on systematic and random (simulation) approximation and its mix (aka variance reduction) has been very elegantly embraced in mathematics especially by Chebyshev (e.g. a book long account of Chebyshev Systems) but largely missed in academic statistics until maybe recently.

    In the 1990,s I did an informal survey of a sort, in that I wanted to get discrete distributions on a small number of possibilities (-1,0,1 or -3,-1,0,1,3) that mimicked say the Normal distribution to help teach non-statisticians about distributions, and especially sampling distributions. So I asked most of the theoretical statisticians I knew, how one would do this and no one seemed to know (though some thought there should be an answer somewhere). I eventually found something called the von Mises Step function approximation which referenced back to Guass rules and more generally Chebyshev Systems.

    It is simple to construct a discrete approximation on n points that matches the first 2*n – 1 moments of a continuous distribution. So knowing this I would then ask around to see who knew – other than those with an interest in numerical integration or early attempts to approximate posteriors – almost no one had heard of this. My guess is that such approximations did not seem elegant enough or particularly relevant. It though maybe the best introduction to non-trivial monte-carlo methods and posterior simulation. My favourite introduction is the Hammersley and Handscombe book from many years ago.

    Some minor points

    “7 significant figures” – that helps when writing on computational methods and their implementation (better check on reproducibility)

    “from Cuachy data” – did they rule out/check for multi-modality??

    Importance sampling (the Radon-Nikodyme derivative – which I never will learn to spell – being the exact but just mathematical version) is my guess for being the eventual “winner” over MCMC

    Keith

  5. The bugs example is interesting. I'd love to see a comparison of all the bugs archive models, using different sampling strategies.

    One thing I've definitely struggled with reading most MCMC books is that it is unclear what should be my default computation technique. I haven't read Roberts's new book, but I assume its the same.

    Given the simple problems I've faced, even univariate Metropolis and slice sampling have always proved adequate, but I'm sure I'm several orders of magnitude slower than I need to be. Reading Roberts's work, I get the feeling I should be implementing population monte carlo ideas. Reading people like Jun Liu makes me feel I should be using some Hybrid MCMC approach. Reading some of your work, I get the feeling I should reparameterize. Given that most applied folks aren't really interested in computation, some stronger default advice would definitely be nice.

  6. I am a computational biologist and was intrigued by the statement about "the challenges of using sample data to estimate properties of a network." Could you elaborate a bit?

    _Taku

Comments are closed.