Combining results from multiply imputed datasets

Aaron Haslam writes:

I have a question regarding combining the estimates from multiply imputed datasets. In the third addition of BDA on the top of page 452, you mention that with Bayesian analyses all you have to do is mix together the simulations. I want to clarify that this means you simply combine the posteriors from the MCMCs from the different datasets? For instance, with a current study I am working on I have 5 imputed datasets with missing outcome data imputed. I would generate individual posteriors for each of these datasets then mix them together to obtain a combined posterior and then calculate the summary statistics on this combined posterior.

I replied that yes, that is what I would do. But then I thought I’d post here in case anyone has other thoughts on the matter.

6 thoughts on “Combining results from multiply imputed datasets

  1. That’s working for me. I also found that while chains for a single imputed data set converge, as expected chains do not necessarily converge across imputed data sets.

    But the approach I like most is to add an imputation model to your model of interest and estimate everything at once. This approach should be best at capturing the uncertainty about imputed data points on parameter estimates. But this approach might also be slow if you have tens of thousands of missing data points.
    Guido

  2. I believe Jerry Reiter has a paper on this subject.
    Zhou, X. and Reiter, J. P. (2010), A note on Bayesian inference after multiple imputation, The American Statistician, 64, 159 – 163.
    http://www2.stat.duke.edu/~jerry/Papers/tas10.pdf

    In short, yes, Zhou and Reiter (2010) recommends to simply combine the posterior samples from each imputed dataset. That said, they recommend larger values of m (the number of imputed datasets), as small values of m (e.g., m=5) seem to give too narrow of uncertainty intervals.

  3. From what I read of Zhou and Reiter and Gelman, combining draws from the different MCMC chains results in a reasonable approximation of uncertainty due to missing data.

    Zhou and Reiter state that moments of this distribution, such as mean and variance, cannot be used to summarize this distribution unless normality can be assumed, which makes sense.

    But given that using more than 10 imputed datasets can make even small data analysis problems quite large, would it make sense to try to use a non or semi-parametric prior to try to incorporate the uncertainty over measurements? Given the large amount of work on priors in Bayesian inference, it seems that it should be possible to encode this information as a prior distribution for a Bayesian model rather than as data in the likelihood.

  4. Hello. I’m a beginner with regards to this issue, and unsure as to how to mix individual posteriors to obtain a combined posterior. Is there any literature which explains this in a plain-English style? Thank you.

Leave a Reply to Phil Cancel reply

Your email address will not be published. Required fields are marked *