Shane Frederick shares some observations regarding junk survey responses:
Obviously, some people respond randomly. For open ended questions, it is pretty easy to determine the fraction who do so. In some research I did with online surveys, “asdf” was the most common and “your mama” was 9th. This fraction is small (maybe 1-2%). But the fraction of random responses is harder to identify (and is likely higher) for items with binary and multichotomous response options, since many respondents must realize their random responding can go undetected. Hence, you can’t use the random response rate from open ended questions to assess this. You can do other things to try to estimate it (like ask them “Is 8+4 less than 3?” YES NO). But two problems remain: the fraction saying YES is a blend of random and perverse responding and both of these things vary across items. Dramatically.
I put up a few questions on Google Consumer Surveys with large samples. Random + perverse response rates differ dramatically:
Do you have a fraternal twin? YES NO
4% Yes. *Pretty close to truth*
Do you have an identical twin? YES NO
8% Yes. *Pretty far from truth, but funnier to lie about?*
Is 8+4 less than 3? YES NO
11% Yes. *Profound innumeracy, confusion, or just fucking with me?*
Were you born on the planet Neptune? YES NO
17% Yes. *Perhaps using it metaphorically, as in “My friends say I’m a weird guy”?*
In a recent published paper I [Frederick] averred that you could just multiply the number of people who endorse something crazy by the number of response options to estimate the fraction of random responders. But this is obviously wrong.
So, basically, I’m not sure what to do. You could look at response latencies or something, but then you end up imposing some arbitrary thresholds which are unsatisfying, much like removing outliers without any good justification that the responses are not sincere.
My reply: These responses are hilarious. I believe there is some literature on this sort of thing but I’m not the expert on it. I’ve looked a bit into list experiments (you can search my blog, I have a post with a title like, A list of reasons not to trust list experiments) but there seems to be a lot of information on the actual responses. Maybe you could learn something by regressing these on demographics, also see if the same people who give wrong answers for some of these, give wrong answers for others.