Skip to content

What to read to catch up on multivariate statistics?

Henry Harpending writes:

I am writing to ask you for a recommendation of something I can read to catch up on multivariate statistics. I am happy with random processes and linear algebra since they are important in population genetics. My last encounter with real statistics was several decades ago.

Recently I have had to dip my toes into real multivariate statistics again and I am completely lost. I can’t, for example, figure out how a random effects model is different from what we used to call “partialing out” nuisance covariates. I have a hard time concentrating on exactly what a “BLURP” model is because the name is so silly.

Can you recommend something accessible to me that would put me on track?

My reply: if you’re interested particularly in random effects models, I will (parochially) refer you to my own book with Jennifer Hill. You can jump straight to the chapters on multilevel modeling.

If the question is about traditional multivariate methods such as factor analysis, principal components, etc., that I don’t really know! But I think my book would be a good start.

Do readers have any suggestions for a good book, preferably model-based, on multivariate methods such as factor analysis, principal components, etc.?


  1. Chris G says:

    I learned factor analysis from one of Geoffrey McLachlan’s books, either Finite Mixture Models or The EM Algorithm and Its Extensions. I don’t recall which at the moment but probably the latter. I found both books quite useful in general. I also had a copy of Johnson and Wichern, Applied Multivariate Statistical Analysis, on my desk. It was useful and, relative to McLachlan’s texts, seemed more targeted to an intro audience. (My recollection of J&W is that the text-to-equation ratio was lower than ideal.)

    My previous employer owned all three of the aforementioned books and I no longer have any them at hand. Every few weeks I find myself wishing I did – not the particular texts so much as wishing I had a few good reference texts on my shelf – so I’m in the market for a book (or two) which covers multivariate methods. Andrew, your book with Hill as well Bayesian Data Analysis are on my short list. I understand a new edition of the latter is due out in July. Should I wait for the new edition?

  2. konrad says:

    From a Machine Learning and in particular a Probabilistic Graphical Model perspective, I feel compelled to mention “Pattern Recognition and Machine Learning” by Christopher Bishop.

    • Chris G says:

      I’m not familiar with the book but Tipping and Bishop’s “Probabilistic Principal Components Analysis”, J. R. Statist. Soc. B (1999), vol. 61, Part 3, pp. 611-622 is an excellent paper.

  3. I’d recommend Applied Multivariate Statistical Analysis by Johnson and Wichern.

  4. Jim says:

    I second Andrew’s recommendation of his book. It’s very clear and actually fun to read.

    The other book worth buying is Mostly Harmless Econometrics. It takes a pretty different view of causal inference (only attempt it when you have an experiment/tenure-getting instrument), but is entertaining and well written.

  5. Steve says:

    ‘Numerical Ecology in R’ by Borcard et al. is a nice overview of methods with practical examples – not really model based but covers all the standard methods used in environmental science, engineering, etc.

  6. edi says:

    I like ‘Legendre & Legendre (2012). Numerical Ecology.’ for multivariate stats.
    For a more model-based multivariata analysis the work of David Warton looks promising:

  7. Anonymous says:

    A long time ago I read Seber´s book and Gnanadesikan´s book; now, they could be outdated but I liked.
    Hopufully you could find them in a library or as third hand book.

  8. Jeremy Miles says:

    I’d suggest “The Essence of Multivariate Thinking” by Lisa Harlow.

  9. Sam says:

    There are classics such as T.W. Anderson’s and Johnson and Wichern. There is a fairly updated volume titled Modern Multivariate Statistical Techniques by Alan Izenman which covers usual topics such as dimensionality reduction, clustering, regression, etc., as well as selected topics in machine learning.

  10. Frank says:

    I’ve always been a fan of Mardia, Kent & Bibby’s “Multivariate Analysis.” Some may find it a little old-fashioned but the exposition is extremely clear. I find it both charming and amusing that there’s a chapter in the book called “Econometrics.”

  11. Wolfgang says:

    Bernhard Flury und Hans Riedwyl: computergestützte Analyse mehrdimensionaler Daten G. Fischer, 1983
    According to the website of Alan Izenman where the dataset can be found: the english translation seems to be
    Flury, B. and Riedwyl, H. (1988).
    Multivariate Statistics, A Practical Approach, Cambridge University Press.

    The book covers basic multivariate statistics by one extended example, namly detecting false swiss banknotes from real ones.

    I am autrian, not swiss, so there is no misguided patriotism involved in recommending a book by swiss authors.
    (Hans Riedwyl by the way spent a lot of years in Bloomington, Indiana and ironically did not die beeing struck by lightening or hit by a tornado but in the italian alps)

  12. Mike says:

    Dillon and Goldstein’s book Multivariate Analysis remains one of the clearest expositions I’ve ever read. It’s really great, covers all the classic techniques and rumors are that Dillon is working on an update. Also Harry Harman’s Modern Factor Analysis is lucid and very thorough. That said, both books are 25+ years since publications.

  13. My wife has a paper on principal component analysis that is under peer review for publication right now. She suggests that when it comes to PCA that “Principal Component Analysis” by Wold, Esbensen and Geladi is a great place to start. Here is a link to the paper.