Nate Delaney-Busch writes:
I’m a PhD student of cognitive neuroscience at Tufts, and a question came recently with my colleagues about the difficulty of random sampling in cases of highly controlled stimulus sets, and I thought I would drop a line to see if you had any reading suggestions for us.
Let’s say I wanted to disentangle the effects of word length from word frequency on the speed at which people can discriminate words from pseudowords, controlling for nuisance factors (say, part of speech, age of acquisition, and orthographic neighborhood size – the number of other words in the language that differ from the given word by only one letter). My sample can be a couple hundred words from the English language.
What’s the best way to handle the nuisance factors without compromising random sampling? There are programs that can automatically find the most closely-matched subsets of larger databases (if I bin frequency and word length into categories for a factorial experimental design), but what are the consequences of having experimental item essentially be a fixed factor? Would it be preferable to just take a random sample of the English language, then use a heirarchical regression to deal with the nuisance factors first? Are there measures I can use to determine the quantified extent to which chosen sampling rules (e.g. “nuisance factors must not significantly differ between conditions”) constrain random sampling? How would I know when my constraints start to really become a problem for later generalization?
Another way to ask the same question would be how to handle correlated variables of interest like word length and frequency during sampling. Would it be appropriate to find a sample in which word length and frequency are orthogonal (e.g. if I wrote a script to take a large number of random samples of words and use the one where the two variables of interest are the least correlated)? Or would it be preferable to just take a random sample and try to deal with the collinearity after the fact?
I don’t have any great answers except to say that in this case I don’t know that it makes sense to think of word length or word frequency as a “treatment” in the statistical sense of the word. To see this, consider the potential-outcomes formulation (or, as Don Rubin would put it, “the RCM”). Suppose you want the treatment to be “increase word length by one letter.” How do you do this? You need to switch in a new word. But the effect will depend on which word you choose. I guess what I’m saying is, you can see how the speed of discrimination varies by word length and word frequency, and you might find a model that predicts well, in which case maybe the sample of words you use might not matter much. But if you don’t have a model with high predictive power, then I doubt there’s a unique right way to define your sample and your population; it will probably depend on what questions you are asking.
Delaney-Busch then followed up:
For clarification, this isn’t actually an experiment I was planning to run – I thought it would be a simple example that would help illustrate my general dilemna when it comes to psycholinguistics.
Your point on treatments is well-taken, though perhaps hard to avoid in research on language processing. It’s actually one of the reasons I’m concerned with the tension between potential collinearity and random sampling in cases where two or more variable correlate in a population. Theoretically, with a large random sample, I should be able to model the random effects of item in the same way I could model the random effects of subject in a between-subjects experiment. But I feel caught between a rock and a hard place when on the one hand, a random sample of words would almost certainly be collinear in the variables of interest, but on the other hand, sampling rules (such as “general a large number of potential samples and keep the one that is the least collinear”) undermines the ability to treat item as an actual random effect.
If you’d like, I would find it quite helpful to hear how you address this issue in the sampling of participants for your own research. Let’s say you were interested in teasing apart the effects of two correlated variables – education and median income – on some sort of political attitude. Would you prefer to sample randomly and just deal with the collinearity, or constrain your sample such that recruited participants had orthogonal education and median income factors? How much constraint would you accept on your sample before you start to worry about generalization (i.e. worry that you are simply measuring the fixed effect of different specific individuals), and is there any way to measure what effect your constraints have on your statistical inferences/tests?