I say all samples are bad but they are bad in different ways and to different degrees all the time. But yes, I agree that some books might leave students with the impression that we just shouldn’t do what they call non probability sampling at all. I think that’s not what you see so much in methods books (at least in social research) because most people are not probability sampling at all. Lots of focus groups are really helpful at understanding things. Lots of times someone does a study inside a single school and it helps understand things at other schools.

It all does depend on who the audience is too. The first time through a topic you’re definitely not going to get all the complexity; it wouldn’t be appropriate. To me part of the problem is that often students don’t get the second or third pass through the material.

PRNGs after thorough testing (such as the die-harder suite of tests), give as close to a-priori known probability sampling as you’re going to get.

I started thinking J(ao) would not consider this to be truly “a priori”, since you used data regarding the testing the algorithm plus whatever background info that was used to come up with it. This sounds silly so I am probably wrong.

]]>1. I agree that a lot of times we do know a fair amount. In just about every sampling problem I’ve heard of, the people doing the sampling know a lot about the population. And I have seen some real-world samples that are true probability samples: these are samples not of people but of records. For example we have a few thousand files of legal cases and we sample 100 of them at random for audits. Nonresponse and missing data are not a problem here. So I overstated it when I wrote, “just about every real survey is a non-probability sample.” I should say, “just about every real survey of people is a non-probability sample.” There’s lots of sampling that’s not of people (although that’s not the topic of Salganik’s book).

2. You write, “The problem with saying that all samples are bad . . .” But I never said all samples are bad. That’s one of the problems with textbooks: they can leave students with the impression that a non-probability sample is “bad.” Non-probability samples can be just fine; we use them all the time.

]]>My point was not that students should be doing RDD sampling. I only brought up RDD sampling to point out that even that sampling method, which seems like pure probability sampling, isn’t. I agree that probability sampling can be part of a sampling design, and it’s something worth teaching, but in a practical book about data collection I think it’s important to emphasize that real-world surveys are not probability samples, because the probability of inclusion in the sample is generally not known.

]]>In the probability sample category yes we know it all goes wrong in practice but at least there is an attempt to think about probability and randomness in practice.

Then we might consider what if we randomly select class sections and then students in there, and then in that case, what about no-shows and people who have dropped. Or we might do an email survey but what about people who never check their college email? And we’ll likely only get 20% based on past experience. Is it better to do a massive email blast or to, yes, randomly select a list of maybe 250 and really work hard to try to get them to respond so that possibly we get 75%. (And by the way what are the odds we could get access to the needed sampling frame.) All things that are worth discussing and experimenting with (For teaching I like SAMP since it let’s us look at results with non response factored in … and the convenience sample has no non response really because the concept is not even meaningful.)

So in a methods text book it is indeed helpful to distinguish between probability sample and non probability sample, at least from my perspective.

RDD is expensive and most people, including most college faculty, do not have access to it nor is it appropriate for what they are doing.

]]>What I meant were a priori point probabilities, of course…. A priori… y’know, before any data, and without any uncertainty.

Maybe I do not understand. Does this exist? Some expectations about the universe are built into your genes. True a priori probability seems like something outside the realm of human, or any other lifeform’s, experience.

]]>it’s not clear that even in the archetypal examples there are anything like a priori probabilities.

Stuff like this only comes from those who don’t actually collect and process data. If you take a bunch of heart rate readings from people that are above 10^5 beats per minute, what would you think?

]]>going deeper than that:

Nonresponse is a big issue, but the number of phones on which a person can be reached ranges from 0 to infinity and has nontrivial numbers of people at both 0 and out in the 4, 5 or 6 range, so “probability of selection” is unknown with RDD.

]]>