New Multiple Imputation R Package “mi” (beta release)

We recently uploaded on to CRAN multiple imputation package “mi” which we have been developing.

The aim of package mi is to make multiple imputation transparent and easy to use for the user. Hence there are few characteristics that we believe are valuable.
1. Graphical diagnostics of imputation models and convergence of the imputation process.
2. Use of bayesglm to treat the issue of separation.
3. Imputation model specification is made similar to how you would fit a regression model in R.
4. It automatically detects some problematic characteristics in the given dataset and alerts the user.

Please give it a try if you have any dataset that has missingness.

Also we are still in the process of improving the package, thus your input is most welcome.

One caution is if you are using big dataset with large number of missingness across many variables, it may take some time for process to converge. We admit, it is not the fastest imputation package on the market.

However, once we can get the basics down, speeding things up is not so difficult. So please bare with it for now.

There are future directions we plan to expand such as imputation of time-series cross-sectional data, hierarchical data, etc. But for now these features are not part of the package.

Happy Holidays!!

5 thoughts on “New Multiple Imputation R Package “mi” (beta release)

  1. Thanks! My goals for the next few months are:

    Start using R (I've read the basic documentation)
    Do some MI analyses
    Do some item response theory analyses (particularly Rasch modeling).

  2. 1) The histograms look like they are out of focus. Is it really necessary to plot all three histograms together? It might be better just to plot the complete histogram with the imputed histogram shaded in as if highlighted.

    2) You mention that mi may be slow to converge for a large dataset. The missing pattern plot will also have a problem with large datasets. A missing value plot with interaction, as Mondrian has, is better then (and may be better for small datasets too).

  3. Antony,

    Thanks for the comments. We're hoping that if we put this stuff out there, people will take it and improve upon it. Or, better still, copy the best of our ideas and incorporate it in their own software.

  4. I thought your readers might be interested in watching this new video as an introduction to R.

    The R and Science of Predictive Analytics: Four Case Studies in R – the Video: <a href="http://www.lecturemaker.com/2009/02/r-kickoff-video/” target=”_blank”>www.lecturemaker.com/2009/02/r-kickoff-video/

    Panel of four recognized R users from industry:

    Bo Cowgill, Google
    Itamar Rosenn, Facebook
    David Smith, Revolution Computing
    Jim Porzak, The Generations Network
    Moderator and co-chair of Bay Area R User Group:

    Michael E. Driscoll, Dataspora LLC

Comments are closed.