Skip to content

Kaggle Kernels

Anthony Goldbloom writes:

In late August, Kaggle launched an open data platform where data scientists can share data sets. In the first few months, our members have shared over 300 data sets on topics ranging from election polls to EEG brainwave data. It’s only a few months old, but it’s already a rich repository for interesting data sets.

It’s also a nice place to share reproducible data science. We have built a tool called Kaggle Kernels, which allows data scientists and statisticians to share notebooks and scripts in Python or R on top of the data. If you find analysis you want to extend, you can “fork it” which gives you a reproducible version without going through the pain of replicating the author’s environment. It’s useful for learning new techniques (by being able to fork and play with other’s code), to share your side project with a large community and to draw attention to your research and store it in a way that can be easily reproduced.

He adds:

We don’t support Stan yet but we inevitably will.

Sooner rather than later, I hope!

P.S. Jamie Hall of Kaggle writes:

We’ve got RStan and PyStan ready to go in Kernels now. It would be fantastic to see some examples of the best ways to use them.

P.P.S. Aki has made a Kaggle notebook Bayesian Logistic Regression with rstanarm, and it works just fine.


  1. Jamie Hall says:

    We’ve got RStan and PyStan ready to go in Kernels now. It would be fantastic to see some examples of the best ways to use them.

  2. You’re in luck. We have all sorts of examples and a lot of doc:

    The manual has a lot of detail and covers a lot of modeling techniques with examples. The example models repo translates a lot of popular data sets and books in specific domains like cognitive science and ecology. The case studies have fully worked examples. Then RStan and PyStan themselves both have doc—I know there’s a vignette for RStan with a lot of examples.

    • Jamie Hall says:

      That’s great, thanks! It would be fantastic to see some of these techniques used in Kernels. In the past we’ve found that packages really take off with our community when there are executable and forkable examples to play with and build on, even if the docs have been around for a while.

  3. Alexia says:

    I’m using SAS and R in combination. Is there anything like kaggle kernels that works for SAS? I suspect that it’s unlikely to even happen given how SAS cost money and is a nightmare to install with the millions of optional parts (which I have no idea what they do) but one can wish.

  4. Ben Goodrich says:

    I should think that Kaggle would also want to have rstanarm and brms available for use with these Kernel things. Email me if you need any help get them installed and set up.

Leave a Reply