I happened to come across this post from 2011, which in turn is based on thoughts of mine from about 1993.

It’s important and most of you probably haven’t seen it, so here it is again:

Ratio estimates are common in statistics. In survey sampling, the ratio estimate is when you use y/x to estimate Y/X (using the notation in which x,y are totals of sample measurements and X,Y are population totals).

In textbook sampling examples, the denominator X will be an all-positive variable, something that is easy to measure and is, ideally, close to proportional to Y. For example, X is last year’s sales and Y is this year’s sales, or X is the number of people in a cluster and Y is some count.

Ratio estimation doesn’t work so well if X can be either positive or negative.

More generally we can consider any estimate of a ratio, with no need for a survey sampling context. The problem with estimating Y/X is that the very interpretation of Y/X can change completely if the sign of X changes.

Everything is ok for a point estimate: you get X.hat and Y.hat, you can take the ratio Y.hat/X.hat, no problem. But the inference falls apart if you have enough uncertainty in X.hat that you can’t be sure of its sign.

This problem has been bugging me for a long time, and over the years I’ve encountered various examples in different fields of statistical theory, methods, and applications. Here I’ll mention a few:

– LD50

– Ratio of regression coefficients

– Incremental cost-effectiveness ratio

– Instrumental variables

– Fieller-Creasy problem

LD50We discuss this in section 3.7 of Bayesian Data Analysis. Consider a logistic regression model, Pr(y=1) = invlogit (a + bx), where x is the dose of a drug given to an animal and y=1 if the animal dies. The LD50 (lethal dose, 50%) is the value x for which Pr(y=1)=0.5. That is, a+bx=0, so x = -a/b. This is the value of x for which the logistic curve goes through 0.5 so there’s a 50% chance of the animal dying.

The problem comes when there is enough uncertainty about b that its sign could be either positive or negative. If so, you get an extremely long-tailed distribution for the LD50, -a/b. How does this happen? Roughly speaking, the estimate for a has a normal dist, the estimate for b has a normal dist, so their ratio has a Cauchy-like dist, in which it can appear possible for the LD50 to take on values such as 100,000 or -300,000 or whatever. In a real example (such as in section 3.7 of BDA), these sorts of extreme values don’t make sense.

The problem is that the LD50 has a completely different interpretation if b>0 than if b<0. If b>0, then x is the point at which any

higherdose has a more than 50% chance of killing. If b<0, then any doselowerthan x has a more than 50% chance to kill. The interpretation of the model changes completely. LD50 by itself is pretty pointless, if you don’t know whether the curve goes up or down. And values such as LD50=100,000 are pretty meaningless in this case.

Ratio of regression coefficientsHere’s an example. Political science Daniel Drezner pointed to a report by James Gwartney and Robert A. Lawson, who wrote:

Economic freedom is almost 50 times more effective than democracy in restraining nations from going to war. In new research published in this year’s report [2005], Erik Gartzke, a political scientist from Columbia University, compares the impact of economic freedom on peace to that of democracy. When measures of both economic freedom and democracy are included in a statistical study, economic freedom is about 50 times more effective than democracy in diminishing violent conflict. The impact of economic freedom on whether states fight or have a military dispute is highly significant while democracy is not a statistically significant predictor of conflict.

What Gartzke did was run a regression and take the coefficient for economic freedom and divide it by the coefficient for democracy. Now I’m not knocking Gartzke’s work, nor am I trying to make some smug slam on regression. I love regression and have used it for causal inference (or approximate causal inference) in my own work.

My only problem here is that ratio of 50. If beta.hat.1/beta.hat.2=50, you can bet that beta.hat.2 is not statistically significant. And, indeed, if you follow the link to Gartzke’s chapter 2 of this report, you find this:

The “almost 50” above is the ratio of the estimates -0.567 and -0.011. (567/11 is actually over 50, but I assume that you get something less than 50 if you keep all the significant figures in the original estimate.) In words, each unit on the economic freedom scale corresponds to a difference of 0.567 on the probability (or, in this case, I assume the logit probability) of a militarized industrial dispute, while a difference of one unit on the democracy score corresponds to a difference of 0.011 on the outcome.

A factor of 50 is a lot, no?

But now look at the standard errors. The coefficient for the democracy score is -0.011 +/- 0.065. So the data are easily consistent with a coefficient of -0.011, or 0.1, or -0.1. All of these are a lot less than 0.567. Even if we put the coef of economic freedom at the low end of its range in absolute value (say, 0.567 – 2*0.179 = 0.2) and put the coef of the democracy score at the high end (say, 0.011 + 2*0.065=0.14)–even then, the ratio is still 1.4, which ain’t nothing. (Economic freedom and democracy score both seem to be defined roughly on a 1-10 scale, so it seems plausible to compare their coefficients directly without transformation.) So, in the context of Gartzke’s statistical and causal model, his data are saying something about the relative importance of the two factors.

But, no, I don’t buy the factor of 50. One way to see the problem is: what if the coef of democracy had been +0.011 instead of -0.011? Given the standard error, this sort of thing could easily have occurred. The implication would be that democracy is associated with more war. Could be possible. Would the statement then be that economic freedom is

negative 50 timesmore effective than democracy in restraining nations from going to war??Or what if the coef of democracy had been -0.001? Then you could say that economic freedom is 500 times as important as democracy in preventing war.

The problem is purely statistical. The ratio beta.1/beta.2 has a completely different meaning according to the signs of beta.1 and beta.2. Thus, if the sign of the denominator (or, for that matter, the numerator) is uncertain, the ratio is super-noisy and can be close to meaningless.

Incremental cost-effectiveness ratioSeveral years ago Dan Heitjan pointed me to some research on the problem of comparing two treatments that can vary on cost and efficacy.

Suppose the old treatment has cost C1 and efficacy E1, and the new treatment has cost C2 and efficacy E2. The

incremental cost-effectiveness ratiois (C2-C1)/(E2-E1). In the usual scenario in which cost and efficacy both increase, we want this ratio to be low: the least additional cost per additional unit of efficacy.Now suppose that C1,E1,C2,E2 are estimated from data, so that your estimated ratio is (C2.hat-C1.hat)/(E2.hat-E1.hat). No problem, right? No problem . . . as long as the signs of C2-C1 and E2-E1 are clear. But suppose the signs are uncertain–that could happen–so that we are not sure whether the new treatment is actually better, or whether it is actually more expensive.

Consider the four quadrants:

1. C2 .gt. C1 and E2 .gt. E1. The new treatment costs more and works better. The incremental cost-effectiveness ratio is positive, and we want it to below.

2. C2 .gt. C1 and E2 .lt. E1. The new treatment costs more and works worse. The incremental cost-effectiveness ratio is negative, and the new treatment is worse no matter what.

3. C2 .lt. C1 and E2 .gt. E1. The new treatment costs less and works better! The incremental cost-effectiveness ratio is negative, and the new treatment is better no matter what.

4. C2 .lt. C1 and E2 .lt. E1. The new treatment costs less and works worse. The incremental cost-effectiveness ratio is positive, and we want it to behigh(that is, a great gain in cost for only a small drop in efficacy).Consider especially quadrants 1 and 4. An estimate or a confidence interval in incremental cost-effectiveness ratio is meaningless if you don’t know what quadrant you’re in.

Here are the references for this one:

Heitjan, Daniel F., Moskowitz, Alan J. and Whang, William (1999). Bayesian estimation of cost-effectiveness ratios from clinical trials. Health Economics 8, 191-201.

Heitjan, Daniel F., Moskowitz, Alan J. and Whang, William (1999). Problems with interval estimation of the incremental cost-effectiveness ratio. Medical Decision Making 19, 9-15.

Instrumental variablesThis is another ratio of regression coefficients. For a weak instrument, the denominator can be so uncertain that its sign could go either way. But if you can’t get the sign right for the instrument, the ratio estimate doesn’t mean anything. So, paradoxically, when you use a more careful procedure to compute uncertainty in an instrumental variables estimate, you can get huge uncertainty estimates that are inappropriate.

Fieller-Creasy problemThis is the name in classical statistics for estimating the ratio of two parameters that are identified with independent normally distributed data. It’s sometimes referred to as the problem as the ratio of two normal means, but I think the above examples are more realistic.

Anyway, the Fieller-Creasy problem is notoriously difficult: how can you get an interval estimate with close to 95% coverage? The problem, again, is that there aren’t really any examples where the ratio has any meaning if the denominator’s sign is uncertain (at least, none that I know of; as always, I’m happy to be educated further by my correspondents). And all the statistical difficulties in inference here come from problems where the denominator’s sign is uncertain.

So I think the Fieller-Creasy problem is a non-problem. Or, more to the point, a problem that there is no point in solving. Which is one reason it’s so hard to solve (recall the folk theorem of statistical computing).

Fairly standard practice for cost-effectiveness analyses is to look at the sampling or posterior distribution of the incremental cost-effectiveness ratio on the cost-effectiveness plane and calculating the probability it is cost-effective. Being cost-effective isn’t a matter of which quadrant you’re in (although this may be important for other reasons) rather on which side of a line that bisects the plane you’re on.

So I actually recently encountered this issue when trying to build an orthogonal regression technique for data with a strong phylogenetic signal (traditionally, phylogenetic signal is handled by having a non-trivial covariance matrix on the residuals of a regression model).

The issue was, that for orthogonal regression, both the lines y = 0, and x = 0 are acceptable — and null — outcomes. Getting appropriate estimates for the variance of our regression coefficient (essentially a/b) was a bit challenging. The standard way to get a variance estimate is through jackknifing, but dropping out a single value can greatly change the ratio a/b. This lead to occasionally having abnormally large variances or small variances.

My solution was to consider the value a/b as not the ratio of two quantities but as an angle on unit circle (i.e. a = sin(phi), b = cos(phi), for some angle phi). We can then get an error estimate for the angle which, in our case, seemed to follow a von Mises (i.e. circular normal) distribution and use that angle to determine whether the coefficient a/b was substantially different from either the lines y = 0 or x = 0.

This approach seemed to work pretty well, and gave reasonable confidence intervals for the ratio even when the confidence interval spanned both negative and positive ratios. I wouldn’t be surprised if a similar approach might work for other problems.

Can some of this issue be avoided by putting a prior on the ratio itself? If you were being vague I guess you would still want to allow positive or negative values, which means the interpretation issue could still arise, but you could presumably avoid the unrealistically large ratios.

In the LD50 example you usually know whether the thing you’re giving is a toxin, or an elixer-of-life. So informative priors on X will suffice most times.

The situation with b=0 means that there’s NO effect of dosing. Unlikely, the main argument is that dosing people with 16 tons of anything will kill them just by sheer crushing force, Unfortunately that argument doesn’t work for the asymptotic drug-like dosing regime (ie. less than 1 kg)

As the model in BDA3_3.7 is defined so that x is the log dose, b<0 also implies that the probability of death approaches 1 when the dose approaches 0. Even if you didn't know anything about the implications of the thing you're giving, you most likely would know whether the experimental setting is such that everyone dies without it or such that everyone survives without it.

With a model that predicted some more reasonably behaving P(death|no dose), a perhaps more natural definition of LD50 would be the dose which halves the survival probability compared to no dose, in which case an elixir-of-life would have an undefined LD50.

For the war, econ and democracy example, my first thought was that calculating a ratio is quite meaningless indeed. If the confidence interval for the democracy coefficient includes 0, the confidence interval for the ratio includes every ratio from minus to plus infinity.

However, here’s something that could be useful:

Why don’t we look at the interaction coefficient, beta(Democracy*Econ)? If that is positive, we know that in the area where beta(Democracy) is close to zero Democracy is still reducing war through Econ, so we’ll estimate a positive ratio. If the interaction coefficient is negative, the ratio is negative too.

Another slightly more calculusy approach:

beta(Econ) / beta(Democracy) = (dWar/dEcon)/(dWar/dDemo) = (dWar/dEcon)/(dWar/dEcon * dEcon/dDemo) ~ dDemo/dEcon.

Basically, in the area where democracy doesn’t correlate much with war, it may still correlate with economy. If the correlation is positive, so should be the ratio.

I’m not sure if either the interaction or the econ/demo correlation are useful or meaningful, but maybe someone with a bit more time to do the algebra on the partial derivatives can figure a useful way to estimate the ratio without the pesky beta(Demo)=dWar/dDemo in the denominator.

For the war/econ/democracy example, why not use a boosted generalized linear or additive modeling framework and use the selection probabilities of base learners with democracy in them vs. econ as a gauge for how predictive those variables are?

Yep, it sounds like a good idea in general to use something other than a naive linear model if the naive linear model is giving you a 0 denominator.

For the instrumental variable case, seems like in most cases you could impose a very strong prior on the sign of the effect of the instrument on treatment assignment. It would be weird to be confident enough that you had an instrument to perform an IV analysis but unsure of its sign.

“The “almost 50” above is the ratio of the estimates -0.567 and -0.011. (567/11 is actually over 50, but I assume that you get something less than 50 if you keep all the significant figures in the original estimate.) In words, each unit on the economic freedom scale corresponds to a difference of 0.567 on the probability (or, in this case, I assume the logit probability) of a militarized industrial dispute, while a difference of one unit on the democracy score corresponds to a difference of 0.011 on the outcome.”

The worse problem here is the units of measurement! If Economic freedom has a standard deviation of about 10 and democracy has a standard deviation of 0.2, then in standardized terms the magnitudes of the two effects are about equal!

This is perhaps an interesting reading on the ratio subject in regression analysis: J. R. Statist. Soc. A (1993). 156, Part 3, pp. 379-392. Spurious Correlation and the Fallacy of the Ratio Standard Revisited. By RICHARD A. KRONMAL.