Early this afternoon I made the plan to teach a new course on sampling, maybe next spring, with the primary audience being political science Ph.D. students (although I hope to get students from statistics, sociology, and other departments). Columbia already has a sampling course in the statistics department (which I taught for several years); this new course will be centered around political science questions. Maybe the students can start by downloading data from the National Election Studies and General Social Survey and running some regressions, then we can back up and discuss what is needed to go further.

About an hour after discussing this new course with my colleagues, I (coincidentally) received the following email from Mike Alvarez:

If you were putting together a reading list on sampling for a grad course, what would you say are the essential readings? I thought I’d ask you because I suspect you might have taught something along these lines.

To which Mike replied:

I wasn’t too far off your approach to teaching this. I agree with your blog posts that the Groves et al. book is the best basic text to use on survey methodology that is currently out there. On sampling I have in the past relied on some sort of nonlinear combination of Kish and a Wiley text by Levy and Lemeshow, though that was unwieldy for students. I’ll have to look more closely at Lohr, my impression of it when I glanced at it was like yours, that it sort of underrepresented some of the newer topics.

I think Lohr’s book is great, but it might not be at quite the right level for political science students. I want something that is (a) more practical and (b) more focused on regression modeling rather than following the traditional survey sampling textbook approach of just going after the population mean. I like the Groves et al. book but it’s more of a handbook than a textbook. Maybe I’ll have to put together a set of articles. Also, I’m planning to do it all in R. Stata might make more sense but I don’t know Stata.

Any other thoughts and recommendations would be appreciated.

Have you looked at Lumley's "Complex Surveys"? It seems to have the practical approach you're looking for, including coverage of regression/modeling, and it's all based in R (using Lumley's own "survey" package). At the very least, it's a handy reference.

Just out of curiosity how would this course relate to Chapter 7 of BDA? Would it be an expansion and more detailed elaboration? I found that chapter particularly interesting but perhaps a bit sparse.

I really enjoyed Lumley's book as well; I agree with Jerzy.

You might look at "Applied Survey Data Analysis" by a bunch of U Michigan people. It covers a good range of topics and comes with code for just about every package.

[not that I'm arguing against the use of my book, but it may be a little aggressively design-based for Andrew's tastes. I'm looking at implementing things like Mr. P. to ameliorate this in the future].

Professor Gelman, you should teach Sample Survey this fall!

I would suggest something that can add a nice hands-on character to your course.

Have your students digest published descriptions of sampling methods for real surveys appearing in the literature.

I think that if they do this carefully then they will discover various things that they wouldn't necessarily get out of textbook treatments of sampling:

1. Descriptions of sampling methodologies are often not detailed enough so that readers can understand what has actually been done.

2. In some cases the procedures seem to be ambiguous in the sense that two different teams might follow the same procedures and still sample in very different ways. I don't mean that the units selected might differ between the teams due to selecting different random numbers along the way but that sometimes the procedures won't nail down a unique unit to be sampled. Field teams are then left with a lot of discretion which they will resolve in varying ways.

3. The calculations made, e.g., of confidence intervals, might not be consistent with the sampling procedures that have been used.

Many readers of surveys, including many apparently sophisticated ones, seem to roll over and suspend all critical faculties at the mention of the word "random". Yet, if you scratch the surface you see that lots of sampling schemes used in practice are really quite weak, are often ill specified and may have implications for calculation that haven't been followed.

Obviously, you will want your students to come out of your course able to, e.g., calculate confidence intervals for complex samples. But you'd also hope that they won't throw all this training out the window when it comes to reading survey methodologies.

I could supply you with some specific examples if you want.

Mike Spagat