PyStan!

Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it.

Stan, like Python, is completely free and open-source.

P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.

29 thoughts on “PyStan!

    • Stan’s written in C++. RStan is a wrapper of the C++ library that exposes Stan functionality in R.

      We want to do something similar for Python. Ideally without making it a combination shot involving R in the middle. For one thing, I’d rather not go down the GPL route for Python.

  1. You should post something to the PyMC mailing list. I believe they’re planning to incorporate Hamiltonian samplers into the next version. It might make more sense to have Stan live in that framework than as a separate fragmented MCMC tool. Even if not, it’s the most likely place to find Python programmers with the right background knowledge.

    • I could never get PyMC installed. It balks about not finding a Fortran lib, and I never had the patience to sort it out. So I’d like something a little simpler to install if possible.

      It’s too large a project to try to integrate with PyMC’s modeling language to combine the two.

      We really just want a wrapper around Stan that lets you get data into and out of Stan from Python as easily as possible.

    • It’d be great if someone who understood Sage could integrate Stan. Sage is GPL-ed, so I’d rather not have that be the only way to access Stan in Python.

      Sage sounds ambitious, with the home page saying “Mission: Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab.” This also puts it squarely into the sights of R and Julia and Octave.

  2. PyMC3 development is currently underway. It’s using Theano for building the computation graph and doing automatic diff for HMC.

    Is the idea to (i) have a wrapper for the C++ library or, (ii) have a Python interface where you specify a model as a string, write it to file, and call into Stan (similar to what rjags and brugs does)?

    • Thanks for the pointer — I’ve glanced through the docs, but never used it. It certainly has an array of impressive functionality. Basic HMC and auto-diff isn’t that hard to get off the ground (it only took us a few weeks to implement both), but making it efficient and flexible in the models it can handle is a whole lot of work. If Theano’s doing HMC, they should include Matt’s no-U-turn sampler. And they might be interested in how we handled constrained variables and vectorizations and cumulative distributions for truncation.

      Option (i) is how Jiqiang implemented RStan, but I’ve often wondered if that was the right decision. Option (i), wrapping at the memory level, is tighter, and should be faster. But it means rewriting the command-line control software in each integration and also figuring out all the data transport to and fro. Option (ii) seems less efficient, but it means everyone uses the same underlying code. It then just requires file based I/O, which we already have (though arguably, it should be in a more standard format like JSON, as someone just suggested on our mailing list).

  3. Pingback: Questions for the STAN team | Good Morning, Economics

  4. I’ve found that the quickest way to go from a C(++) API to a Python (or whatever) wrapper is Swig. Here’s a real simple example:

    https://github.com/kylebgorman/swipe

    How much Swig boilerplate you write depends on how different you want the Python API to be from the C(++) API. If they’re basically identical, you hardly have to do anything but point SWIG at the header files.

  5. Consider using Boost.Python: http://www.boost.org/libs/python/doc/

    You can also use Py++ for automatic Boost.Python code generation:
    http://www.ohloh.net/p/pygccxml

    Here’s a video from PyConZA 2012 on it: “Our hybrid programming journey with Python and C++”
    http://www.youtube.com/watch?v=bXWOv5SVatA

    Also relevant is the comparison of using Boost.Python, Py++, SWIG, and Pybindgen:
    http://stackoverflow.com/questions/456884/extending-python-to-swig-not-to-swig-or-cython/456949#456949

    • Yikes, that’s a lot of options. There aren’t nearly so many options with R.

      Any hints as to which ones will support dynamic compilation and linking? The way Stan works is that a user writes a model, calls a function from within Python to convert to C++ and compile, then the generated C++ code needs to be compiled and dynamically linked from Python.

      And which ones have a BSD-type license instead of GPL or more restrictive?

      A Boost option sounds attractive because we already use Boost, so it doesn’t add any license or code dependencies we don’t already have (technically it adds a code dependency to the new code in Boost we haven’t used before, but we have to use something!).

      • > There aren’t nearly so many options with R.

        I’m assuming humorous intent here :-)

        > dynamic compilation and linking [support]

        I’m afraid I can’t be of much help on this one, outside of my use cases so far.

        Apparently, there are some dynamic-linking-specific features available — http://wiki.python.org/moin/boost.python/CrossExtensionModuleDependencies — but, again, this is outside my area of expertise.

        Perhaps looking at what others have done would be of more help:
        http://www.boost.org/libs/python/doc/projects.html

        It sounds a little like you’re talking about JITting, perhaps Cling (based on Clang) would be worth looking into:
        http://root.cern.ch/drupal/content/cling

        Note, however, that at this point Cling not yet officially supported on Windows:
        “Windows is not supported platform yet, but there is some work being done by external contributors.”
        http://root.cern.ch/drupal/content/cling-build-instructions

        Cling’s licensing:
        * “the license is the same as LLVM/Clang’s”
        http://root.cern.ch/phpBB3/viewtopic.php?f=21&t=14740

        * “LLVM was released under the University of Illinois Open Source License, a BSD-style license.”
        http://en.wikipedia.org/wiki/LLVM

        > which ones have a BSD-type license instead of GPL or more restrictive?

        Boost.Python and Py++ are both licensed under the Boost Software License (which is even less restrictive than BSD):
        http://www.boost.org/users/license.html
        http://www.boost.org/libs/python/doc/news.html // 19 November 2004 – 1.32 release “Updated to use the Boost Software License.”
        http://www.ohloh.net/p/pygccxml // “Licenses: Boost Software License”

        PyBindGen: GNU LGPL v2.1 // https://launchpad.net/pybindgen
        SWIG: GPL3, but see http://www.swig.org/legal.html and http://stackoverflow.com/questions/4272414/can-an-lgpl-library-use-gpl-code-to-produce-bindings

        HTH! :-)

      • That is possible from a running python session, but you can only import the compiled
        function once because of limitations with the way python works with dynamic libraries.
        If you want to recompile and remimport the callback you need to start a new python session.

        Ipython solves this problem for cython by generating a new name for each iteration of the module:
        so a module can be edited, compiled and reloaded repeatedly:

        http://bit.ly/cython_magic

        My two cents: I’ve used both boost::python and cython for C++ libraries
        and I find cython to be quite a bit simpler.

      • Hi–I’ve contributed to core numpy/scipy before, and I’ve done some limited wrapping of C/C++ code for python.

        These things change fairly quickly in the python world. For example, SWIG vs. Cython is a talk at PyCon in the next couple of weeks. The general feeling among many numerics python developers is that cython is the best option, but I don’t know about the state of cython and c++ templates. If it’s for a small program, then some pythonistas I know will use ctypes instead of cython. AFAIK, Boost python isn’t being used or updated by many groups. SWIG is something else I haven’t heard much about.

        For the best up to date information and the most applicable to your situation I would (strongly) recommend asking the numpy, cython, and scikits-learn mailing lists their opinion. Those are the python developers with the most experience linking together Fortran/C/C++ numerics code with python. They are also very wary of GPL licenses, so they can help you avoid tools which might require that.

  6. I am very interested in contributing to a Python STAN implementation. I may even be able wrangle some other man hours of company time for this (my company loves Python tools for stats and we have many folks who make a routine of contributing to these kinds of things). I would require some sort of well-defined tasks, though. A vague, diffuse to-do list works less well for us. Let me know!

  7. Pingback: Stan in IPython: getting starting | Healthy Algorithms

    • Repeating a note I made on the stan-users list, as I catch up on my RSS feeds: Abraham’s approach runs the command-line interface of Stan, meaning that the initial data needs to be written out in an rdump-like text format, and results read from a CSV. This is fine and convenient for small models, but going through the filesystem can really kill performance if you have a lot of data and/or are doing this repeatedly. Also, Abraham doesn’t show how to write data from python to the appropriate format; it’s not hard, but I have some code to do so here.

      If going through the filesystem is too slow, before an actual PyStan option is available, one hacky option is to interface with R and use Rstan. @zmk’s mention of ipython’s rmagic is great for interactive work; I wrote a few little wrappers to help out with doing it through rpy2 (ie the same approach, but not needing to be in ipython).

  8. I have some questions about STAN. Is it meant to be just a nice MCMC sampler, or will it be a complete Bayesian inference package? In other words, will you simple “define posterior and get samples”, or will you be able to define priors, error models, and then get posterior density estimates, credibility intervals, and predict things e.g. P[Y=1| X=x, data]. Also, if the likelihood and prior are conjugate then does STAN automatically detect this and produce/sample-from an exact expression for the posterior (with no MCMC needed)? If STAN is just an MCMC sampler, then my feeling is that it should be part of PyMC. If it is much more, then it should be part of statsmodels [1].

    By the way, the newest anaconda distribution [2] seems to allow a quick installation of pymc using pip install pymc.

    [1] statsmodels.sourceforge.net
    [2] https://store.continuum.io/cshop/anaconda

  9. Sorry for coming to this a bit late, but I have just released a code called xdress (http://bit.ly/xdress-code) which is meant for automatic wrapper generation of C++ classes and functions to Python, delivers nice views into STL containers (vectors -> numpy arrays, etc), is based off of Cython, and has a very malleable type system. One of my friends (Chris J-S) encouraged me to polish this up and release it at PyCon 2013. Later he mentioned that STAN might be a possible use case. I’d love for people to hammer on this code so let me know if it works or what went wrong.

Comments are closed.