Survey analysis in R

Posted on April 30, 2009 9:13 AM by Andrew

There’s a lot of good stuff here (from Thomas Lumley). It’s all classical stuff–no small-area estimation, no Mister P, etc.–but the classical stuff is still pretty useful. The “survey” package for R looks pretty good; in particular, it allows you to specify the survey design, which is a big step beyond simply specifying survey weights.

I’d also like to recommend Sharon Lohr’s book from 1999. When’s the second edition coming out?

3 thoughts on “Survey analysis in R”

Keith O'Rourke on April 30, 2009 5:00 AM at 5:00 am said:

Great to hear about good stuff like this that includes vignettes, worked examples and lots of documentation.

A couple of speculations as more of this stuff becomes available (realted to Alex's past post on Universities being "challenged" by online courses)

Would someone with the usual single course in survey and access to this package (and maybe the short course) be able to compete with those who specialized in survey and made it their career?

That is, will this kind of available materials facilitate a "Renaissance Man" approach in statistical practice and education?

Keith
Thomas Lumley on May 1, 2009 3:27 PM at 3:27 pm said:

Andrew,

Thanks for the link and comments. I agree that small-area estimation is an important missing area. I don't know enough about it at the moment, although there are people nearby who do, if I can pick their brains.

-thomas

Keith:
I think 'compete with' is a strange way to put it.

The goal of the software is to reduce the need for expertise in the technical computation of sampling-weighted estimates, and allow people to focus on the difficult issues such as non-response, confounding, framing issues in questionnaire design, etc, etc. These issues aren't specific to complex probability samples; they are just as important in census data or panel studies or pretty much any other data analysis context.

The classical theory for sampling weights is not rocket science, and a lot of what takes up space in textbooks is computational shortcuts for working out the Horvitz–Thompson estimator in special cases such as cluster sampling or stratified sampling. Unless these provide intuition about the estimators (which they often don't), only specialists should need to know these computational formulas. My forthcoming book (to accompany the package) doesn't cover most of the formulas and uses the space this frees up to cover regression, data visualization, calibration of weights, and other issues that often get shortchanged in books on classical survey analysis.
Keith O'Rourke on May 3, 2009 3:37 PM at 3:37 pm said:

> 'compete with' is a strange way to put it.
Perhaps I should have said "enable them to do the analyses [almost] as well as experts in survey sampling".

But its the delegation to software of the technical barriers and tasks needed for the analyses – so that more general issues and talents can be brought to bear on the important aspects of questions – that I believe is the real promise of vignettes and compendiems (a.k.a. fully worked out examples with lots of documentation and references)

And then also less need to be distracted by the details of technical components and implementations in courses and textbooks – as people can get this -when they need it – by "refering" to the examples as fuly worked out in the vignettes.

So being able to focuss on the idea getting the appropriate averages for the desired populations by properly accounting for the sampling via importance sampling (a.k.a. the Radon–Nikodym theorem) rather than being able to derive the computational shortcuts for working out the Horvitz–Thompson estimator in [numerous] special cases.

So, I think we are agreeing

Keith

Comments are closed.