Ratio estimates are common in statistics. In survey sampling, the ratio estimate is when you use y/x to estimate Y/X (using the notation in which x,y are totals of sample measurements and X,Y are population totals).
In textbook sampling examples, the denominator X will be an all-positive variable, something that is easy to measure and is, ideally, close to proportional to Y. For example, X is last year’s sales and Y is this year’s sales, or X is the number of people in a cluster and Y is some count.
Ratio estimation doesn’t work so well if X can be either positive or negative.
More generally we can consider any estimate of a ratio, with no need for a survey sampling context. The problem with estimating Y/X is that the very interpretation of Y/X can change completely if the sign of X changes.
Everything is ok for a point estimate: you get X.hat and Y.hat, you can take the ratio Y.hat/X.hat, no problem. But the inference falls apart if you have enough uncertainty in X.hat that you can’t be sure of its sign.
This problem has been bugging me for a long time, and over the years I’ve encountered various examples in different fields of statistical theory, methods, and applications. Here I’ll mention a few:
- Ratio of regression coefficients
- Incremental cost-effectiveness ratio
- Instrumental variables
- Fieller-Creasy problem
We discuss this in section 3.7 of Bayesian Data Analysis. Consider a logistic regression model, Pr(y=1) = invlogit (a + bx), where x is the dose of a drug given to an animal and y=1 if the animal dies. The LD50 (lethal dose, 50%) is the value x for which Pr(y=1)=0.5. That is, a+bx=0, so x = -a/b. This is the value of x for which the logistic curve goes through 0.5 so there’s a 50% chance of the animal dying.
The problem comes when there is enough uncertainty about b that its sign could be either positive or negative. If so, you get an extremely long-tailed distribution for the LD50, -a/b. How does this happen? Roughly speaking, the estimate for a has a normal dist, the estimate for b has a normal dist, so their ratio has a Cauchy-like dist, in which it can appear possible for the LD50 to take on values such as 100,000 or -300,000 or whatever. In a real example (such as in section 3.7 of BDA), these sorts of extreme values don’t make sense.
The problem is that the LD50 has a completely different interpretation if b>0 than if b<0. If b>0, then x is the point at which any higher dose has a more than 50% chance of killing. If b<0, then any dose lower than x has a more than 50% chance to kill. The interpretation of the model changes completely. LD50 by itself is pretty pointless, if you don’t know whether the curve goes up or down. And values such as LD50=100,000 are pretty meaningless in this case.
Ratio of regression coefficients
Here’s an example. Political science Daniel Drezner pointed to a report by James Gwartney and Robert A. Lawson, who wrote:
Economic freedom is almost 50 times more effective than democracy in restraining nations from going to war. In new research published in this year’s report , Erik Gartzke, a political scientist from Columbia University, compares the impact of economic freedom on peace to that of democracy. When measures of both economic freedom and democracy are included in a statistical study, economic freedom is about 50 times more effective than democracy in diminishing violent conflict. The impact of economic freedom on whether states fight or have a military dispute is highly significant while democracy is not a statistically significant predictor of conflict.
What Gartzke did was run a regression and take the coefficient for economic freedom and divide it by the coefficient for democracy. Now I’m not knocking Gartzke’s work, nor am I trying to make some smug slam on regression. I love regression and have used it for causal inference (or approximate causal inference) in my own work.
My only problem here is that ratio of 50. If beta.hat.1/beta.hat.2=50, you can bet that beta.hat.2 is not statistically significant. And, indeed, if you follow the link to Gartzke’s chapter 2 of this report, you find this:
The “almost 50″ above is the ratio of the estimates -0.567 and -0.011. (567/11 is actually over 50, but I assume that you get something less than 50 if you keep all the significant figures in the original estimate.) In words, each unit on the economic freedom scale corresponds to a difference of 0.567 on the probability (or, in this case, I assume the logit probability) of a militarized industrial dispute, while a difference of one unit on the democracy score corresponds to a difference of 0.011 on the outcome.
A factor of 50 is a lot, no?
But now look at the standard errors. The coefficient for the democracy score is -0.011 +/- 0.065. So the data are easily consistent with a coefficient of -0.011, or 0.1, or -0.1. All of these are a lot less than 0.567. Even if we put the coef of economic freedom at the low end of its range in absolute value (say, 0.567 – 2*0.179 = 0.2) and put the coef of the democracy score at the high end (say, 0.011 + 2*0.065=0.14)–even then, the ratio is still 1.4, which ain’t nothing. (Economic freedom and democracy score both seem to be defined roughly on a 1-10 scale, so it seems plausible to compare their coefficients directly without transformation.) So, in the context of Gartzke’s statistical and causal model, his data are saying something about the relative importance of the two factors.
But, no, I don’t buy the factor of 50. One way to see the problem is: what if the coef of democracy had been +0.011 instead of -0.011? Given the standard error, this sort of thing could easily have occurred. The implication would be that democracy is associated with more war. Could be possible. Would the statement then be that economic freedom is negative 50 times more effective than democracy in restraining nations from going to war??
Or what if the coef of democracy had been -0.001? Then you could say that economic freedom is 500 times as important as democracy in preventing war.
The problem is purely statistical. The ratio beta.1/beta.2 has a completely different meaning according to the signs of beta.1 and beta.2. Thus, if the sign of the denominator (or, for that matter, the numerator) is uncertain, the ratio is super-noisy and can be close to meaningless.
Incremental cost-effectiveness ratio
Several years ago Dan Heitjan pointed me to some research on the problem of comparing two treatments that can vary on cost and efficacy.
Suppose the old treatment has cost C1 and efficacy E1, and the new treatment has cost C2 and efficacy E2. The incremental cost-effectiveness ratio is (C2-C1)/(E2-E1). In the usual scenario in which cost and efficacy both increase, we want this ratio to be low: the least additional cost per additional unit of efficacy.
Now suppose that C1,E1,C2,E2 are estimated from data, so that your estimated ratio is (C2.hat-C1.hat)/(E2.hat-E1.hat). No problem, right? No problem . . . as long as the signs of C2-C1 and E2-E1 are clear. But suppose the signs are uncertain–that could happen–so that we are not sure whether the new treatment is actually better, or whether it is actually more expensive.
Consider the four quadrants:
1. C2 .gt. C1 and E2 .gt. E1. The new treatment costs more and works better. The incremental cost-effectiveness ratio is positive, and we want it to be low.
2. C2 .gt. C1 and E2 .lt. E1. The new treatment costs more and works worse. The incremental cost-effectiveness ratio is negative, and the new treatment is worse no matter what.
3. C2 .lt. C1 and E2 .gt. E1. The new treatment costs less and works better! The incremental cost-effectiveness ratio is negative, and the new treatment is better no matter what.
4. C2 .lt. C1 and E2 .lt. E1. The new treatment costs less and works worse. The incremental cost-effectiveness ratio is positive, and we want it to be high (that is, a great gain in cost for only a small drop in efficacy).
Consider especially quadrants 1 and 4. An estimate or a confidence interval in incremental cost-effectiveness ratio is meaningless if you don’t know what quadrant you’re in.
Here are the references for this one:
Heitjan, Daniel F., Moskowitz, Alan J. and Whang, William (1999). Bayesian estimation of cost-effectiveness ratios from clinical trials. Health Economics 8, 191-201.
Heitjan, Daniel F., Moskowitz, Alan J. and Whang, William (1999). Problems with interval estimation of the incremental cost-effectiveness ratio. Medical Decision Making 19, 9-15.
This is another ratio of regression coefficients. For a weak instrument, the denominator can be so uncertain that its sign could go either way. But if you can’t get the sign right for the instrument, the ratio estimate doesn’t mean anything. So, paradoxically, when you use a more careful procedure to compute uncertainty in an instrumental variables estimate, you can get huge uncertainty estimates that are inappropriate.
This is the name in classical statistics for estimating the ratio of two parameters that are identified with independent normally distributed data. It’s sometimes referred to as the problem as the ratio of two normal means, but I think the above examples are more realistic.
Anyway, the Fieller-Creasy problem is notoriously difficult: how can you get an interval estimate with close to 95% coverage? The problem, again, is that there aren’t really any examples where the ratio has any meaning if the denominator’s sign is uncertain (at least, none that I know of; as always, I’m happy to be educated further by my correspondents). And all the statistical difficulties in inference here come from problems where the denominator’s sign is uncertain.
So I think the Fieller-Creasy problem is a non-problem. Or, more to the point, a problem that there is no point in solving. Which is one reason it’s so hard to solve (recall the folk theorem of statistical computing).
P.S. This all-statistics binge is pretty exhausting! Maybe this one can count as 2 or 3 entries?