Some insider stuff on the Stan refactor

From the stan-dev list, Bob wrote [and has since added brms based on comments; the * packages are ones that aren’t developed or maintained by the stan-dev team, so we only know what we hear from their authors]:

The bigger picture is this, and you see the stan-dev/stan repo really spans three logical layers:

                      stan
          ----------------------------------
  math <- language <- algorithms <- services <- pystan
                                             <- rstan   <- rstanarm
                                                        <- rethinking (*)
                                                        <- brms (*)
                                             <- cmdstan <- statastan
                                                        <- matlabstan
                                                        <- stan.jl

What we are trying to do with the services refactor is make a clean services layer between the core interfaces (pystan, rstan, cmdstan) so that these don't have to know anything below the services layer. Ideally, there wouldn't be any calls from pystan, rstan, or cmdstan other than ones to the stan::services namespace. services, on the other hand, is almost certainly going to need to know about things below the algorithms level in language and math.

And Daniel followed up with:

This clarified a lot of things. I think this is what we should do:

  1. Split algorithms and services into their own repos. (Language too, but that's a given.)
  2. Each "route" to calling an algorithm should live in the "algorithms" repo. That is, algorithms should expose a simple function for calling it directly. It'll be a C++ API, but not ones that the interfaces use directly.
  3. In "services," we'll have a config object with validation and only a handful of calls that pystan, rstan, cmdstan call. The config object needs to be simple and safe, but I think the pseudocode Bob and I created (which is looking really close to Michael's config object if it were safe) will suffice.

I don't really know what they're talking about but I thought it might be interesting to those of you who don’t usually see software development from the inside.

8 thoughts on “Some insider stuff on the Stan refactor

  1. I’d also add _brms_ as one of the third level packages (alongside _rstanarm_). It has a different approach than _rstanarm_, which has a little more overhead, but is extremely flexible and really shows off the power of Stan.

  2. I added brms and added asterisks to indicate that brms and rethinking are not packages that we (the Stan dev team) wrote or are maintaining. If whoever wrote brms (or someone else) wants to write a blog post and mail to the stan-users list about what it does, you can send it to me ([email protected]) and I can post it here on Andrew’s blog.

    Given that we’re now several months ahead of when this email exchange took place, I can say that on the topics raised by Daniel:

    1. We haven’t done the split yet, but still plan to go ahead with it.

    2. The “routes” as Daniel’s calling them are mostly coded and integrated with PyStan and CmdStan. This has already removed hundreds of lines of redundant, error-prone code.

    3. We’re going to leave config up to the interfaces rather than trying to handle that on the Stan C++ side.

    • Folks probably would benefit from a hitchhikers guide to the Stan universe (maybe there already is one?)

      I had not done anything in stan for a while and tried some things out this afternoon.

      I learned:

      “+=” is not a replacement for “~” (that took a while as I looked at https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html and then looked for explanation of “+=” in Reference Manual, Version 2.10.0, then “target +=” and finally the 47 entry for “target” explained it).

      You can use that bmrs package and others to write stan code that you can use and learn from (that did not take as long)

      So a hitchhikers guide would have a value.

      On the other hand, I would not want to slow the expansion and improvement in the Stan universe.

      • I’m not sure what you mean by a hitchiker’s guide, but if you want to see what’s changed in terms of being deprecated, I added an appendix in the Stan 2.10 manual that should be complete in terms of everything that’s ever been deprecated (like increment_log_prob(u); being replaced with target += u;. Some behavior’s that were never specified have become fixed, like the values taken on by undefined variables. So far, the only real backward-compatiblity break we’ve had from documented behavior (at least that I can recall), is the removal of direct access to lp__.

        What we really need to do is add a generic += operator. I just didn’t have time for 2.10. It’ll get in soon, if not for 2.11 then for 2.12.

        • Thanks for letting me know about the appendix.

          > a hitchiker’s guide

          To address the challenge of an occasional user trying to determine where to go, to do what they want and how especially long they might end up being there.

          Paul Buerkner’s brms package really helped me sort that out by quickly generating stan models close to what I want to do that ran without noticeable problems. Now I am fairly sure I can do what I want in a few days rather than possibly weeks (which would make what I want to do not worth it.)

          I only found out about brms from Shravan’s comment – though I had not anticipated it generates stan model code that I learn from and can modify if needed for what I really want.

          (Maybe brms is a good way for people to learn about stan?)

Leave a Reply

Your email address will not be published. Required fields are marked *