Using partial pooling when preparing data for machine learning applications

Posted on April 18, 2018 9:01 AM by Andrew

Geoffrey Simmons writes:

I reached out to John Mount/Nina Zumel over at Win Vector with a suggestion for their vtreat package, which automates many common challenges in preparing data for machine learning applications.

The default behavior for impact coding high-cardinality variables had been a naive bayes approach, which I found to be problematic due its multi-modal output (assigning probabilities close to 0 and 1 for low sample size levels). This seemed like a natural fit for partial pooling, so I pointed them to your work/book and demonstrated it’s usefulness from my experience/applications. It’s now the basis of a custom-coding enhancement to their package.

You can find their write up here.

Cool. I hope their next step will be to implement in Stan.

It’s also interesting to think of Bayesian or multilevel modeling being used as a preprocessing tool for machine learning, which is sort of the flipped-around version of an idea we posted the other day, on using black-box machine learning predictions as inputs to a Bayesian analysis. I like these ideas of combining different methods and getting the best of both worlds.

1 thought on “Using partial pooling when preparing data for machine learning applications”

John Mount on April 18, 2018 11:58 AM at 11:58 am said:

Definitely have some Stan projects in the pipeline.

Also, it is fun to try to sneak some well founded Bayesain methods into machine learning (and evidently also vice versa). I think there is a lot to be gained.

Finally I really suggest R users working with machine learning or predictive modeling in R try out vtreat, it can be game changing (makes messy real world data behave almost as well as example data).

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Using partial pooling when preparing data for machine learning applications

1 thought on “Using partial pooling when preparing data for machine learning applications”

Leave a Reply Cancel reply