A colleague writes,
hi andrew,
here’s a small question from a physicist friend:
can you point to a good reference on why regressing y/x against x is a bad thing to do…?
My short answer: it’s not necessarily a bad thing to do at all. It depends on the context. In short: what are x and y?
My context-free comment is that if you’re considering y/x, perhaps they are both positive, in which case it might make sense to work with log(x) and log(y), in which case the regression of log(y/x) on log(x) is the same as the regression of log(y) on log(x), with 1 subtracted from the slope (since log(y/x)=log(y)-log(x)).
P.S. I think it’s ok for me to make fun of physicists since I majored in physics in college and switched to statistics because physics was too hard for me.
Let's see. Ordinarily, a linear regression is:
y = b0 + b1*X + e
where e is assumed to be iid normal.
But you want to regress y/x versus x?
y/x = b0 + b1*X + e
y = b0*x + b1*X*X + e*X
so now you really are going for a quadratic model with no intercept and the errors are no longer iid normal, but rather e*X is iid normal. Doesn't sound like a model I want to fit too often.
Andrew,
I'm surprised that you would say that this is ok. I distrust regressions like this because even if this is the correct form if you have measurement error in x then when x is high y/x will be systematically low thereby generating a spurious result.
Alex
Surely a very general class of time series models in logs would involve you doing something rather like regressing y/x on x. The problems Alex notes about correlated residuals are there but they're not exactly insoluble or even all that serious.
Alex,
It all depends on what x and y are. For example, in studying representation of large and small states or provinces within countries, we've run regressions of log(y/x) on log(x), where the units are states or provinces, x = state population, and y = number of legislators in the state (or y = federal spending in the state). y/x is per-capita representation or spending, and it makes sense to look at that.
Now maybe the physicist had a different example in mind, but that was my point: it depends on what x and y are.