I saw an analysis recently that I didn’t like. I won’t go into the details, but basically it was a dose-response inference, where a continuous exposure was binned into three broad categories (terciles of the data) and the probability of an adverse event was computed for each tercile. The effect and the sample size was large enough that the terciles were statistically-significantly different from each other in probability of adverse event, with the probabilities increasing from low to mid to high exposure, as one would predict.
I didn’t like this analysis because it is equivalent to fitting a step function. There is a tendency for people to interpret the (arbitrary) tercile boundaries as being meaningful thresholds even though the underlying dose-response relation has to be continuous. I’d prefer to start with a linear model and then add nonlinearity from there with a spline or whatever.
At this point I stepped back and thought: Hey, the divide-into-three analysis does not literally assume a step function. It doesn’t assume anything at all; it’s just a data summary! People discretize input variables all the time! So why am I complaining?
I justify my complaints on two levels. First on the grounds of interpretation: my applied colleagues really were interpreting the three-category model in terms of thresholds. The three categories were: “0 to A”, “A to B”, and “B to infinity”. And somebody really was saying something about the effect of exposure A or exposure B. Which just ain’t right.
My second issue is statistical efficiency. You can say that the categorical-input model is nothing but a summary, an estimate of averages—but by binning like this, you lose statistical efficiency. And you become the slave to “statistical significance”; there’s the temptation to butcher your analysis and throw away tons of information, just so you can get a single clean, statistically significant result.
P.S. The more categories you have, the less of a concern it is to discretize. And sometimes your data come in discrete form (see here, for example).