Ben Hansen recommended to me this book and course by Daniel Kaplan. It looks pretty good. I’ve only looked at the website, not the book itself, and I’m sure I’d find lots of places to disagree with it on details, but the general flow seemed reasonable, also I liked that there’s lots of course materials to go with it. Does anyone have any experience with this book? Is it the way to go (for now)?

## “Statistical Modeling: A Fresh Approach”

Looks interesting. “Learn R” is on my to-do list. In addition to being a useful reference, appears it could serve as a friendly introduction to R.

If the goal is to teach statistical modeling (as the title suggests) I suppose it is as good as any other (skimming I liked the illustrations).

If the goal is to teach social scientists how to do research I think it is highly inadequate (though I doubt this is a goal the author had in mind).

^ What titles would you recommend to teach social scientists how to do research?

I am not sure I am happy with anything out there but basically something along the lines of King, Keohana & Verba (I still think is the best book I read in graduate school).

Here are some topics in no particular order:

1. Causal theories: DAGs (e.g. Morgan and Winship, Pearl);

2. Concepts: (e.g. Goertz);

3. Measurement: (e.g. Surveys, scaling, remote sensing, NLP);

4. Lab practice: Lab books, literate programing, reproducibility, protocols;

5. Design of field and lab experiments: E.g. Gerber and Green minus all the potential outcomes nonsense (see counterfactuals and DAGs in point 1 above);

6. Design of observational studies (e.g. Rosenbaum, sample selection, matching, etc.);

7. Graphics, presentation, and style: Reporting checklists (STROBE, CONSORT, etc);

8. Analysis and statistical modeling;

9. Publication process.

In my view a good research study, and certainly one in causal inference, is heavy on design and light on statistics, unless it is really messed up. I also prefer to put the math in the formal theoretical model and not the analysis part. Good designs can rely on simple contrasts.

Descriptive inference and prediction is different, and more fitted to statistical modeling. But other than predicting electoral outcomes, or conflict, I haven’t seen many predictive models used _actively_ in poli sci. Otherwise models are used primarily for descriptive purposes, which is fine, but that is not the sum total of “research”. Or, in the worst case, models and unstructured specification searches are used for causal inference.

I see it everyday, people going through huge data sets to find interesting relations that are then reported as tests of a causal theory. I have heard professors recommend this strategy to their students. And who knows, maybe they are right.

Oh lord, I thought this might be April Fool’s. For a book named “Statistical Modeling: a fresh approach”, I’m shocked on opening the table of contents to find topics that I don’t consider “modeling” (confidence intervals, displaying variation, hypothesis testing, etc.) and nothing “fresh” in the sense of deviation from the standard curriculum. I tend to see “statistical modeling” as a second course starting with linear regression, and moving to logistic and GLM, and maybe finding its way to GAM; a section around cluster analysis, factor analysis, structural models; random effects; decision trees, boosting, bagging, etc.; neural networks; lasso and regularization; models for observational data such as propensity scores; not to forget the various modeling steps such as variable selection, model comparison, model validation techniques.

Kaiser:

I see what you mean, but I don’t think the book is necessarily as bad as you think. First, I think that conf intervals, hyp tests, etc, can be a useful part of statistical modeling—if presented as a way of recognizing predictive and modeling uncertainty. Second, I agree that logistic regression, etc., are great, but this book is supposed to be a first-semester intro.

For a Stats 101 book, it’s pretty good for the parts I looked at. I was just thinking about something else altogether when the book is called “Statistical modeling”.

My general concern with our Stats 101 curriculum is that there is too much packed into one semester, and I’d love for the class to spend at least 30% of the time in the computer lab working with real data sets but lab time is often inefficiently spent and expensive considering the amount of materials we aim to cover in the intro class.

Kaiser: “I’d love for the class to spend at least 30% of the time in the computer lab working with real data sets”

I’d like to phrase that differently: “I’d love for the class to spend 30% of the time in the research lab, carrying out experiments, generating their own data sets, and doing their own analysis.”

Even social scientists might benefit from carrying out experiments with plants, flies, or even at the local cafeteria. that way they can learn how messy the process of generating your own data is.

Is there really the need for yet another Stat Modelling book?

I’m a bit jaded by books that promise a “fresh approach”. A bit like those

“Completely change your life in 4 hours”self help books sold at Airports.Rahul:

It’s not that there’s a need for “yet another” statistics book. It’s that there’s currently

nobook or class material that I like for teaching a general introductory statistics course. So if something new can come along that’s a little bit better than what currently exists, I’m happy.I agree that there are probably enough “learn statistics through R” books. But a good textbook is something different.

I’ll speculate that there will be no book or class material you like for teaching a general intro. stat course.

Unless you write it yourself.

I might prove to be wrong. If so, please do a post the day you find that there is indeed something you like for that purpose! :)

Rahul:

That may be. But there are statistics textbooks that I like, written by people other than myself. So I see no reason to think there can’t be an introductory statistics book that I like, written by someone else.

The book has cogent explanations, and I liked the photos and use of readily-loaded examples for R.

My favorite quote so far: “It’s often said, “Correlation is not causation.” True enough. But it’s an odd thing to say, like saying, “A movie is not a train.””

I can’t say anything about the content of this book, but what’s the deal with that cover? Bambus? Why using a totally random and unrelated picture of something as a stats book cover? I really don’t get the point, it is not even a piece of art which could be interesting to look at. It certainly will enter my personal list of worst stats book covers. At least, it is not the worst since No. 1 on the list is, and probably will always be: http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/

That’s an easily-fixed problem.

I’ve used this book to teach the very basics of data analysis using R (Hooray for RStudio server and the RStudio team!) and a bit of statistical inference in a sophomore level political science class at the University of Illinois.

I am a fan of the book because it enables students to very quickly get to work with data, to make plots of relationships, and to sidestep memorizing canned regression table assumptions in favor of bootstrap and permutation based approaches. Whereas the idea of a standard error may be vague to many students in the first few years of college, producing distributions by hand makes this concept very clear and concrete. My students arrive with high school math and no programming skills and they leave able to download data, make some tables, make some plots, fit some linear models, and to produce bootstrap confidence intervals and permutation tests (all in the context of a linear model). I tell the students that this course alone cannot be understood to enable them to “learn statistics” but that it is a good start that really ought to be followed by other courses. I think that the students really enjoy posing simple social science questions about relationships and then discovering whether said relationships show up when they smooth their plots. So, the book enables me to kick off the learning of statistics by enabling fun and discovery to happen in the classroom — and many feel proud to be learning the tools of modern data analysis.

Kaplan himself uses the book to teach freshmen across the natural, physical, and social sciences. So, it is meant to be very introductory. And thus, it works well for my students, too.

I haven’t grilled Kaplan about it, but, in using the book, I felt like Kaplan had been inspired by George Cobb’s piece, “The Introductory Statistics Course: A Ptolemaic Curriculum?” (http://www.escholarship.org/uc/item/6hb3k0nz).

What books would you recommend as a follow up to this one as self study? I’m a biology grad student who has never had a stats course, but would like to become proficient in statistical modeling. Like many biology students, my exposure to stats has (sadly) been piecemeal, in the context of labs and lecture courses. I’m frustrated because I don’t see the connection between the various statistical tests I’ve encountered and an actual process of building models that explain real world phenomena.