Skip to content

Testing for signficant difference between regression coefficients of two different models from same sample population

Charles Warne writes:

A colleague of mine is running logistic regression models and wants to know if there’s any sort of a test that can be used to assess whether a coefficient of a key predictor in one model is significantly different to that same predictor’s coefficient in another model that adjusts for two other variables (which are significantly related to the outcome). Essentially she’s wanting to statistically test for confounding, and while my initial advice was that a single statistical test isn’t really appropriate since confounding is something that we make an educated judgement about given a range of factors, she is still keen to see if this can be done. I read your 2006 article with Hal Stern “The difference between ‘significant’ and ‘not significant’ is not itself statistically significant” which included the example (p. 328) where evidence for a difference between the results of two independent studies was assessed by summing the squares of the standard errors of each and taking the square root to give the standard error of the difference (se=14). My question is whether this approach can be applied to my colleague’s situation, given that both logistic regression models are based on the same sample of individuals and therefore are not independent? Is there an adjustment that can be used to produce more accurate standard errors for non-independent samples or should i not be applying this approach at all? Is there a better way this problem could be tackled?

My reply: No, you wouldn’t want to take the two estimates and treat them as if they were independent. My real question, though, is why your colleague wants to do this in the first place. It’s not at all clear what question such an analysis would be answering.

P.S. Warne adds:

I completely agree with your question as to why my colleague would want to do this in the first place. Her rationale is that she wants to show that, once controlling for the newly added variable, there is a significant change in the coefficient of the other variable. ie. there are two models

model 1 : y = a + bx1

model 2 : y = a + bx1 + cx2

..and she wants to show there is a significant change in the coefficient ‘b’ between the two models.

I said that the multivariate model 2 speaks for itself and that a practically/clinically important (which depends on the substantive area of interest) change in ‘b’ from model 1 to model 2, combined with evidence of a significant association between ‘c’ and the outcome, shows that variable c is confounding the association between b and y.

I’m not sure what to say. My first thought is that, yes, model 2 has it all, and there’s not much to be learned by comparing the coefficients in the two models. But part of me does think that something can be learned from the comparison.

As to the question of how to get a standard error: b.hat[model2] – b.hat[model1] is a linear function of the data, y, so it shouldn’t be too hard to work out a formula for its standard error. Or you could do it by boostrapping.


  1. Kaiser says:

    It sounds like she has two nested models and with a test of deviance, she can determine which is the better fit, and select that model. This answers the question of whether the covariates are needed, that's how I read it. And she got on the wrong track comparing coefficients of the other variable from two regressions of the same data!

  2. Keith O'Rourke says:

    Agree its a bit weird, but confounding loosely requires both a dependence between covariate and outcome (you have conventional evidence for this) and dependence between covariate and the variable of intererst ("key predictor")

    Evidence of no dependence is not just lack of evidence for dependence and this is usually lacking.

    To tighten up the looseness here you might want to read the appropriate chapter in the Rothman, Greenland and Lash Epi text or Pearl's papers.


  3. Peter Flom says:

    I also don't understand why someone would test this. Yet another abuse of significance tests, I think. How BIG is the difference in coefficients? How MEANINGFUL is it?

  4. george says:

    Charles: in reply to your Q2, probably not. (Pearl Chap 6.2: Why There Is No Statistical Test for Confounding, Why Many Think There Is, and Why They Are Almost Right). I believe there was also a recent AmStat News editorial pointing out that small p-values were not directly relevant to causal inference.

  5. charles says:

    Thanks everybody for their comments, and I agree this type of test is not appropriate. When assessing the amount by which a regression coefficient changes after adjusting for an additional variable, it is better to focus on the difference in the the effect size itself and compare confidence intervals around each estimate.

    In response to Andrew's query about it not being clear what type of question this analysis would be answering, my colleague was interested in whether or not it could be shown that "coefficient A changed by X amount when controlling for variable B and that this difference was statistically significant" but maybe testing for the statistical significance of a difference between coefficients is a flawed approach.

    George: how can i get your link to work, do i need to download another program to run it?

  6. Gustaf says:

    Collinearity is tricky. I had a situation where we were interested in exposure time and species (A and B) (and some other variables). It turned out though, that only species A covered the whole range of exposure time (1-6, a cont. variable). Species B only covered 1-(2,3 very few data points). So an effect of exposure time could mask the effect of species (e.g. if there is a strong effect in species A, it might show up as an effect on exposure time). In the end we tested the effect of species using data with exposure time = 1, and the effect of exposure time using data with species A (i.e. two test and assuming that there is no interaction). Based on that result, species A had an effect but not exposure time, we included only species in the final model. I dont know if this was done correctly and I have never seen anything similar.

  7. JF says:

    (This might have been answered by the time this shows up, but)

    Charles, the paper is a postscript file. A popular choice for programs is here; you'll need both the interpreter and viewer.

  8. dk says:

    Why is this so obviously a bad idea? Can’t one learn something by estimating how much varying x1 affects y in the two models? Subtract one estimated value from the other, compute the standard error for the difference, and one has some information on “how much” of the x1 effect is attributable to x2. That could be meaningful depending on what is being modeled here. E.g., if x1 explains variance in x2, one is finding out how much of x1’s total effect on y is mediated by x2, if one has reason to believe that the causal relations in the world work this way. (Bracing to be slammed; it's okay, well worth it to learn something!)

  9. Simon says:

    I have the same statistical problem. My DV is a repeated measure from one sample population under three different conditions. I want to test whether the condition is a significant moderator for the impact of one predictor (X) on the DV. based on three regression model, the coefficient of X on DV in Condtion 1 is .45, in Condition 2 is .08, and in Condition 3 is .07. It looks like moderation effect is significant. How can I test that? Is it a bad idea?

  10. george says:

    dk: non-collapsibility of odds ratios is a problem for the approach you suggest. Even if x1 and x2 are independent, the 'b' parameter estimated by Warne's model 1 (in general) has a different value to 'b' in model 2.

    The difference between the estimates can certainly be significant – after all, they're estimating different quantities – but that doesn't mean confounding is going on.

  11. Rob says:

    I think there is a problem with the question: parameters are only identified to scale and there are two moving parts between the two models. In some sense, the question is unidentified. It's four quantities, two estimates, and an inequality relating two of them (what will be sigmas).

    b1 and b2 are not as they seem because of the variance normalization. b1 = beta.hat1 / sigma1 while b2 = beta.hat2 / sigma2. The problem is that the change from b1 to b2 is a function of the difference between beta.hat1 and beta.hat2 and the difference between sigma1 and sigma2 and both change (except under the sharp null that x2 is irrelevant; if that was true, the question would not have been asked.

    We know that sigma2 has to be smaller than sigma 1. That helps us some, but I think that is as far as we can go. We know a bit more (though perhaps not germane to the question) because c and b2 have the same scaling factor, but that is still 5 unknowns, 3 knowns. I can imagine bounding it with an exact assumption or distribution about sigma1 vs. sigma2 and we could calculate/integrate over it, but then the answer is this assumption/distribution and I have no idea what would be a reasonable shape. What did I mess up?

  12. dk says:

    hmmm…. interesting.

    Hey, Simon: maybe I'm misunderstanding, but your issue, as I'm reading it, is quite different and much simpler. If you want to know whether a treatment or conditon *moderates* the impact of a predictor, just include the predictor, the treatment, and the predictor x treatment interaction term in one model. the coefficient for the predictor will tell you impact of the predictor in the untreated group, and the coefficient for the predictor x treatment interaction term will tell you the extent to which the predictor's influence is different in the treatment group. (This assumes only 2 treatments and a dummy variable– 0 for untreated and 1 for treated; you have have 3 "conditions" so you should have 2 dummy treatment/condition variables, and two corresponding predictor x treatment/condition interaction terms, in which case your coefficients are comparing the effect of the predictor in each of the 2 treatments/conditions included in the regression equation to the effect of the predictor in the untreated group or excluded condition.) The nested models described in the original post don't address a moderator or interaction effect between "x1" and "x2"; they contemplate some other relationship — maybe mediation, maybe confound — between "x1" and "x2."

  13. Rob says:

    After my impossibility claim earlier, I should be a bit more optimistic. An easy answer, to solve the identification problem I posed previously, is just to linearize it. If model 2 is the truth and the predictions under the logit are not too extreme (say between about 0.2 and 0.8, in other words, the model is not TOO discriminating), you can use a linear model as a first-order Taylor approximation (which should be pretty good in the aforementioned range). It provides a quick and easy answer. With only two (RHS) variables, this is a pretty straightforward ANOVA problem. Regressing x2 on y alone gives you a quantity you want. So does x1 alone. Including both gives you something and a simple algebraic decomposition works out the answer. Standard errors by (case-resampling) bootstrapping the linear system should give you an approximate answer up to linearization error.

  14. Simon says:

    dk, thank you so much for your comments!

    But I am not sure I can use your method to test the moderation, because this is a with-in subject design (no control/treatment group). DV is a repeated measure for one sample. There are three models:
    Time 1: Y=a+b1x1+b2x2+…e
    Time 2: Y=a+b1'x1+b2'x2+…e
    Time 3: Y=a+b1''x1+b2''x2+…e

    I hope to test whether b1, b1', and b1'' are significantly different with each other… I don't know how to do that….

  15. dk says:

    Simon, you'll find a very much on-point discussion of how to analyze your design (including the advantages of appropriately specified multivariate regression with interaction terms over repeated measure ANOVA) in Judd, C. M. (2000). Everyday Data Analysis in Social Psychology: Comparisons of Linear Models. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 370-392). New York: Cambridge University Press.

  16. Simon says:

    dk, thank you so much for your recommendation. I'll definitely check that book!

  17. charles says:

    JF and George – thanks for the help regarding the Pearl paper

    I think i may have found a solution to calculating a standard error for the linear function (b.hat[model2] – b.hat[model1]) testing for the significance of the difference between b in each model, based on what is said in the literature about 'mediation' and how to test for it.

    For those interested, Preacher & Hayes have written about mediation and how to test for it using the Sobel test, and also via bootstrapping methods.

    The first paper is "Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, and Computers, 36, 717-731."

    downloadable from

    ..and the second paper is "Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40, 879-891."

    downloadable from

    I think this is one solution to the problem, though no doubt there are others!

  18. Andrea says:

    You could take a look at the paper "A Statistical Test for the Equality of Differently Adjusted Incidence Rate Ratios" by Hoffman et al. (American Journal of Epidemiology, 2008). It's not exactly what your colleague wants to do, but it may give you a hint on how to face this problem (or, at least, it could give a boost to this discussion).

  19. Leigh says:

    May I ask a relevant question? I really need some help.
    I am estimating a logit model — the response is Yes (1) or No (0). There are a set of independent variables selected, based on prior studies. I partitioned the sample into three categories under the conjecture that coefficients on the independent variables would be different, in magnitude and/or in statistical significance. The simplified model is: Decision = A+A*factor+B+B*factor+C+C*factor+ year_fixed effects+error_term. When invoking proc logistic in SAS, the intercept term is suppressed.
    I found that the coefficient on A*factor is significantly positive whereas the coefficients on B*factor and C*factor are positive but insignificant. I did two alternatives. (1) I estimate an alternative model by making C as the benchmark, and the coefficient on A*factor (B*factor) becomes the incremental effect. (2) I also estimate three logit models by A, B, and C. The results are similar. May I draw a conclusion that a given unit change in factor will lead to a larger increase in the likelihood of event, for observations in the A category?
    Put my question in another way. My objective is to understand whether the decision (Yes or No) will be more sensitive to one unit change in factor when an observation belongs to a given category, for instance, A vs. B. To achieve this goal, what statistical test would you recommend?

    Thank you very much!

  20. Peter B. says:

    There was some debate about the best way to do this between Clifford Clog and Paul Allison in 1995 in the American Journal of Sociology 100(5). Note that neither of these authors thought is was an odd thing to be trying to do. I've done it and found that the formulas proposed in the articles produced results using my data that I found odd. I don't recall the details, but I remember that something came out negative that should have been positive. I ended up following advice posted by Clogg to statalist and bootstrapping the SE's.

  21. John M. says:


    Here's an article from sociology that tackles precisely the original question posed:

  22. Joel says:

    OK – here's my variation. I have several models predicting crime. They have all the same independent variables, but they have different dependent crime variables (e.g. robbery, shoplifing, assault etc). Thus they look something like this:

    Y1=a+b1x1+b2x2+…e etc.


    I want to test whether the same independent variables have different predictive properties between models. e.g. is the predictive power of gender greater in one model than another to a statistically significant extent?

    I'm clueless how to do it. Please can anybody help?

  23. Philip says:

    What about trying to compare two coefficients of two regressors in a single regression with regards their relative importance in impacting on the dependent variable? If I find that both regressors are signficant in determining the dependent variable, can I infer that one regressor is more important in impacting on the dependent variable than the other if its coefficient is of higher magnitude than the other?