Cross-validation to check missing-data imputation

Aureliano Crameri writes:

I have questions regarding one technique you and your colleagues described in your papers: the cross validation (Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, with reference to Gelman, King, and Liu, 1998). I think this is the technique I need for my purpose, but I am not sure I understand it right. I want to use the multiple imputation to estimate the outcome of psychotherapies based on longitudinal data. First I have to demonstrate that I am able to get unbiased estimates with the multiple imputation. The expected bias is the overestimation of the outcome of dropouts.

I will test my imputation strategies by means of a series of simulations (delete values, impute, compare with the original). Due to the complexity of the statistical analyses I think I need at least 200 cases. Now I don’t have so many cases without any missings. My data have missing values in different variables. The proportion of missing values is lower than 30%.

So I would proceed as follows. I set up the chained equations and generate 30-40 imputations. Next I fill in the missings with the mean of the imputed values. With this “repaired” dataset I start the simulations. Among others I delete the outcome of 30 successful and 30 unsuccessful cases. Then I impute again with the same regression equations used before. Finally I pool the results according to the Rubin’s rules and I compare the pooled coefficients with the coefficients generated with the repaired dataset. So I can test if the chained equations can discriminate between successful und unsuccessful courses of therapy.

Does such a proceeding correspond to a cross validation in your sense?

My reply: Yup. Except for one thing. When you do your first imputation, don’t fill in missing entries with the mean of the imputed values. Just use one of the completed datasets (with random imputations) that you created. You can do the whole thing 10 times with different random imputations, but that probably won’t make a difference.