Skip to content
 

PyStan!

Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it.

Stan, like Python, is completely free and open-source.

P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.

29 Comments

  1. zmk says:

    Wouldn’t IPython’s rmagic be enough for people to do that?

  2. c. says:

    You should post something to the PyMC mailing list. I believe they’re planning to incorporate Hamiltonian samplers into the next version. It might make more sense to have Stan live in that framework than as a separate fragmented MCMC tool. Even if not, it’s the most likely place to find Python programmers with the right background knowledge.

    • I could never get PyMC installed. It balks about not finding a Fortran lib, and I never had the patience to sort it out. So I’d like something a little simpler to install if possible.

      It’s too large a project to try to integrate with PyMC’s modeling language to combine the two.

      We really just want a wrapper around Stan that lets you get data into and out of Stan from Python as easily as possible.

  3. OneEyedMan says:

    RSPython might be another avenue to try.

  4. Jordan says:

    If it runs in R, that might mean it runs in Sage, which is how math people like me often use Python in the first place.

    • It’d be great if someone who understood Sage could integrate Stan. Sage is GPL-ed, so I’d rather not have that be the only way to access Stan in Python.

      Sage sounds ambitious, with the home page saying “Mission: Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab.” This also puts it squarely into the sights of R and Julia and Octave.

  5. Thomas Wiecki says:

    PyMC3 development is currently underway. It’s using Theano for building the computation graph and doing automatic diff for HMC.

    Is the idea to (i) have a wrapper for the C++ library or, (ii) have a Python interface where you specify a model as a string, write it to file, and call into Stan (similar to what rjags and brugs does)?

    • Thanks for the pointer — I’ve glanced through the docs, but never used it. It certainly has an array of impressive functionality. Basic HMC and auto-diff isn’t that hard to get off the ground (it only took us a few weeks to implement both), but making it efficient and flexible in the models it can handle is a whole lot of work. If Theano’s doing HMC, they should include Matt’s no-U-turn sampler. And they might be interested in how we handled constrained variables and vectorizations and cumulative distributions for truncation.

      Option (i) is how Jiqiang implemented RStan, but I’ve often wondered if that was the right decision. Option (i), wrapping at the memory level, is tighter, and should be faster. But it means rewriting the command-line control software in each integration and also figuring out all the data transport to and fro. Option (ii) seems less efficient, but it means everyone uses the same underlying code. It then just requires file based I/O, which we already have (though arguably, it should be in a more standard format like JSON, as someone just suggested on our mailing list).

  6. [...] is looking for someone to write a Python wrapper for STAN. In the comments, Bob Carpenter has some suggestions for [...]

  7. Kyle Gorman says:

    I’ve found that the quickest way to go from a C(++) API to a Python (or whatever) wrapper is Swig. Here’s a real simple example:

    https://github.com/kylebgorman/swipe

    How much Swig boilerplate you write depends on how different you want the Python API to be from the C(++) API. If they’re basically identical, you hardly have to do anything but point SWIG at the header files.

  8. MD says:

    Consider using Boost.Python: http://www.boost.org/libs/python/doc/

    You can also use Py++ for automatic Boost.Python code generation:
    http://www.ohloh.net/p/pygccxml

    Here’s a video from PyConZA 2012 on it: “Our hybrid programming journey with Python and C++”
    http://www.youtube.com/watch?v=bXWOv5SVatA

    Also relevant is the comparison of using Boost.Python, Py++, SWIG, and Pybindgen:
    http://stackoverflow.com/questions/456884/extending-python-to-swig-not-to-swig-or-cython/456949#456949

    • Yikes, that’s a lot of options. There aren’t nearly so many options with R.

      Any hints as to which ones will support dynamic compilation and linking? The way Stan works is that a user writes a model, calls a function from within Python to convert to C++ and compile, then the generated C++ code needs to be compiled and dynamically linked from Python.

      And which ones have a BSD-type license instead of GPL or more restrictive?

      A Boost option sounds attractive because we already use Boost, so it doesn’t add any license or code dependencies we don’t already have (technically it adds a code dependency to the new code in Boost we haven’t used before, but we have to use something!).

  9. Ely Spears says:

    I am very interested in contributing to a Python STAN implementation. I may even be able wrangle some other man hours of company time for this (my company loves Python tools for stats and we have many folks who make a routine of contributing to these kinds of things). I would require some sort of well-defined tasks, though. A vague, diffuse to-do list works less well for us. Let me know!

  10. Ismail Sunni says:

    Hi, I’m new to python, but I’m interested to join this open source project. Please feel free to let me know if I can do something to contribute to this PySTAN.

  11. [...] through my email inbox for a while now. Stan, it is. The project has reached the point where the developers are soliciting Python integration volunteers, so I decided it is time to check it [...]

    • Dougal says:

      Repeating a note I made on the stan-users list, as I catch up on my RSS feeds: Abraham’s approach runs the command-line interface of Stan, meaning that the initial data needs to be written out in an rdump-like text format, and results read from a CSV. This is fine and convenient for small models, but going through the filesystem can really kill performance if you have a lot of data and/or are doing this repeatedly. Also, Abraham doesn’t show how to write data from python to the appropriate format; it’s not hard, but I have some code to do so here.

      If going through the filesystem is too slow, before an actual PyStan option is available, one hacky option is to interface with R and use Rstan. @zmk’s mention of ipython’s rmagic is great for interactive work; I wrote a few little wrappers to help out with doing it through rpy2 (ie the same approach, but not needing to be in ipython).

  12. Ian Langmore says:

    I have some questions about STAN. Is it meant to be just a nice MCMC sampler, or will it be a complete Bayesian inference package? In other words, will you simple “define posterior and get samples”, or will you be able to define priors, error models, and then get posterior density estimates, credibility intervals, and predict things e.g. P[Y=1| X=x, data]. Also, if the likelihood and prior are conjugate then does STAN automatically detect this and produce/sample-from an exact expression for the posterior (with no MCMC needed)? If STAN is just an MCMC sampler, then my feeling is that it should be part of PyMC. If it is much more, then it should be part of statsmodels [1].

    By the way, the newest anaconda distribution [2] seems to allow a quick installation of pymc using pip install pymc.

    [1] statsmodels.sourceforge.net
    [2] https://store.continuum.io/cshop/anaconda

  13. Sorry for coming to this a bit late, but I have just released a code called xdress (http://bit.ly/xdress-code) which is meant for automatic wrapper generation of C++ classes and functions to Python, delivers nice views into STL containers (vectors -> numpy arrays, etc), is based off of Cython, and has a very malleable type system. One of my friends (Chris J-S) encouraged me to polish this up and release it at PyCon 2013. Later he mentioned that STAN might be a possible use case. I’d love for people to hammer on this code so let me know if it works or what went wrong.

Leave a Reply