Covariate Adjustment in RCT / Model Overfitting in Multilevel Regression

Makoto Hanita writes:

We have been discussing the following two issues amongst ourselves, then with our methodological consultant for several days. However, we have not been able to arrive at a consensus. Consequently, we decided to seek an opinion from nationally known experts. FYI, we sent a similar inquiry to Larry Hedges and David Rogosa . . .

1)      We are wondering if a post-hoc covariate adjustment is a good practice in the context of RCTs [randomized clinical trials]. We have a situation where we found a significant baseline difference between the treatment and the control groups in 3 variables. Some of us argue that adding those three variables to the original impact analysis model is a good idea, as that would remove the confound from the impact estimate. Others among us, on the other hand, argue that a post-hoc covariate adjustment should never be done, on the ground that those covariates are correlated with the treatment, which makes the analysis model that of quasi-experimental. For your information, we are all in agreement as to the use of covariates for RCTs that are pre-specified, for the purpose of variance reduction. What we cannot agree on is the use of covariates for the purpose of equating two groups in the context of RCTs.

2)      Despite our disagreement on #1, we went ahead and tried fitting the model including those 3 additional covariates to our data. Some of us are suspicious of the results of this analysis because of the possible model overfitting. The situation involves fitting the below model to the sample of 14 schools and 800 students. As you can see, there are 5 fixed effects at the school level to be estimated. The analysis was done on multiply-imputed data (5 imputations were made). From what I have read, assessing model overfitting is rather difficult for multilevel models. In our case, we have an additional complexity of multiply-imputed data:

PostTestij = b0 + b1(Treatmentj) + b2 (School Cov1j) + b3(School Cov2j) + b4(School Cov3j) + b5(PreTestij) + b6(FRL%j) + uj + eij

My reply:

1. In general, yes, I think it’s a good idea to adjust for covariates. On the other hand, 5 predictors is a lot when n=14. Ultimately I think the right approach would be to use informative priors on the coefficients, to partially pool them toward zero and thus compromise between the two alternatives of no adjustment and simple least-squares adjustment. I doubt you’d want to do that, though, and I imagine you’d also be uncomfortable with an approximation such as to fit the adjusted model and then divide the coefficients for the adjustment by 2 (that’s a simple compromise). In your case my inclination would be to adjust for the 3 variables. I expect there’s some work in the statistics literature on what to do in this sort of problem.

2. Your model seems reasonable. As noted above, I might consider informative priors on the b’s.

Possibly more importantly, I’d suggest allowing the coefficient for pre-test to interact with treatment vary by school. Pre-test seems to be your only individual-level predictor so you might as well get more out of it. And, in my experience, pre-test coefficients are systematically different in treatment and control groups.

We can see if Hedges and Rogosa make the same recommendations as I do.

1 thought on “Covariate Adjustment in RCT / Model Overfitting in Multilevel Regression

  1. “We are wondering if a post-hoc covariate adjustment is a good practice in the context of RCTs [randomized clinical trials]”

    Just for the record: AFAIK, the letter combination “RCT” is usually used to indicate a “randomized controlled trial”, not a “randomized _clinical_ trial”…

Comments are closed.