New Multiple Imputation R Package “mi” (beta release)

Posted on December 22, 2008 5:02 AM by masanao

We recently uploaded on to CRAN multiple imputation package “mi” which we have been developing.

The aim of package mi is to make multiple imputation transparent and easy to use for the user. Hence there are few characteristics that we believe are valuable.
1. Graphical diagnostics of imputation models and convergence of the imputation process.
2. Use of bayesglm to treat the issue of separation.
3. Imputation model specification is made similar to how you would fit a regression model in R.
4. It automatically detects some problematic characteristics in the given dataset and alerts the user.

Please give it a try if you have any dataset that has missingness.

Also we are still in the process of improving the package, thus your input is most welcome.

One caution is if you are using big dataset with large number of missingness across many variables, it may take some time for process to converge. We admit, it is not the fastest imputation package on the market.

However, once we can get the basics down, speeding things up is not so difficult. So please bare with it for now.

There are future directions we plan to expand such as imputation of time-series cross-sectional data, hierarchical data, etc. But for now these features are not part of the package.

Happy Holidays!!

5 thoughts on “New Multiple Imputation R Package “mi” (beta release)”

Barry on December 27, 2008 9:45 AM at 9:45 am said:

Thanks! My goals for the next few months are:

Start using R (I've read the basic documentation)
Do some MI analyses
Do some item response theory analyses (particularly Rasch modeling).
Antony Unwin on December 31, 2008 5:25 AM at 5:25 am said:

1) The histograms look like they are out of focus. Is it really necessary to plot all three histograms together? It might be better just to plot the complete histogram with the imputed histogram shaded in as if highlighted.

2) You mention that mi may be slow to converge for a large dataset. The missing pattern plot will also have a problem with large datasets. A missing value plot with interaction, as Mondrian has, is better then (and may be better for small datasets too).
Andrew Gelman on December 31, 2008 11:54 AM at 11:54 am said:

Antony,

Thanks for the comments. We're hoping that if we put this stuff out there, people will take it and improve upon it. Or, better still, copy the best of our ideas and incorporate it in their own software.
Edward Ratzer on January 8, 2009 1:42 PM at 1:42 pm said:

I thought I would try to find out more about MI via Wikipedia but I notice that http://en.wikipedia.org/wiki/Multiple_imputation is rather sparse. Can you add something?

Thanks,

Ed.
Ron Fredericks on March 28, 2009 10:45 AM at 10:45 am said:

I thought your readers might be interested in watching this new video as an introduction to R.

The R and Science of Predictive Analytics: Four Case Studies in R – the Video: <a href="http://www.lecturemaker.com/2009/02/r-kickoff-video/” target=”_blank”>www.lecturemaker.com/2009/02/r-kickoff-video/

Panel of four recognized R users from industry:

Bo Cowgill, Google
Itamar Rosenn, Facebook
David Smith, Revolution Computing
Jim Porzak, The Generations Network
Moderator and co-chair of Bay Area R User Group:

Michael E. Driscoll, Dataspora LLC

Comments are closed.