Andy Nathan had a question about ordered predictor variables:
While Eric Yu and I were mooshing around with the current issues I’m dealing with in my research, I came up with a question I wanted to ask you. To some extent this was stimulated by reading a textbook about IRT, where it is emphasized that IRT allows us to posit equal intervals between test items when we order them by difficulty. A benefit of this is that it produces a mathematically tractable interval scale.
It occurred to me that in social statistics we highly value, and get good mileage out of, those interval scale data that we possess, such as age, education, and income. But — here’s my question — these data are not truly interval in substantive nature. With income, for example, in a given living environment, say NYC, the interval between an income of 15,000 a year and 20,000 a year is much larger than the interval between 150,000 and 155,000 a year. When it comes to age, one year is for social, psychological, and political purposes a bigger interval from 18 to 19 than from 38 to 39. When it comes to education, certain years of schooling are years of great transformation, like first grade, and others are years of less dramatic change in one’s socially or politically relevant capabilities, like fifth or sixth grade. It follows that every dollar, year, or grade in school is not equal to every other dollar, year, or grade in school in predicting outcomes in attitudes, social behavior, or political behavior.
Has anybody noticed this fact and endeavored to, so to speak, rescale some of these variables, perhaps by using available empirical evidence (which might be different in different social environments) to weight different intervals in the scale so that they lose what is now a deceptive quality of equality?
There are ordered logit and probit models that allow categories (e.g., income categories) to be ordered, but with spacing estimated from the data. These are standard tools in generalized linear modeling. There are also nonparametric versions that will transform a continuous response (e.g., income) to ‘stretch out” parts of the scale and shrink others. And of course there are simpler tools like log and square root transformations that will stretch and shrink the scale. Ainally, for variables like age, one can also include non-monotonic transformations (e.g., maybe the young and old are more liberal, and the middle-aged are more conservative) which can be done by discretizing the scale and using indicator variables.