Stan Weekly Roundup, 18 August 2017

Summer? What summer? Stan 2.17 is coming and there’s work to be done.

  • Sebastian Weber has been making huge strides in adding MPI parallel autodiff to the math library (with design maturing for Stan itself and the library interfaces). Ongoing discusions on the Discourse forum and prototypes for a function to add to the Stan library that manages to solve the ship-data-once-and-reuse problem that had been vexing the design up until last week.

  • Bob Carpenter rolled out a new language design document on the forums; have a look if you want to see what may be next for Stan.

  • Jonah Gabry also added summary functions to access all the diagnostics in RStan; this should be a big step forward for RStan users.

  • Ben Goodrich has been fixing issues in Stan itself—he fixed the covariance constrain/unconstrain, which wasn’t causing a problem until you tried to round trip through the RStan functions to constrain/unconstrain. He also fixed some more of the multiple translation unit issues that crop-up with header-only libraries.

  • Mitzi Morris has been wrestling Boost 1.64 so that we can keep up with Dirk Edelbuettel’s proposal to upgrad Boost Headers (CRAN BH package). This has also involved a ton of work from Ben and Daniel Lee.

  • Aki Vehtari has been porting his classes (teaching, not object-oriented) to Python, as has Michael Betancourt.

  • Michael has been gearing up for a lot of teaching, including a really promising looking weeklong intro to Stan in Boston in late September aimed at grad students in physics. Looks like he has also wrangled some pre-StanCon teaching commitments. More details on that to come.

  • Jonah has also been refactorng the loo package and will have an (upgraded or new, couldn’t quite hear which) Pareto smoothed importance sampler. Aki is working on evaluations simultaneously.

  • Rob Trangucci has re-revised the GP chapter of the manual after a lot of feedback. He’s been working intensively with Jonah, Aki, and Dan Simpson on priors. This stuff’s important if you want Bayesian GP inference. Rob’s en route to Ann Arbor to start the Ph.D. program in stats.

  • Sean Talts has shaved a few hours off our continuous integration builds and is looking into more hardware (we have more people contributing). Travis is also starting to choke on a lot of our tests.

  • Sean has also been upgrading our makefiles to C++ standards with help from Daniel Lee.

  • Bob has also been working on adding access to all the variable declarations and constraints from the C++ model class (shapes statically, sizes at runtime).

33 thoughts on “Stan Weekly Roundup, 18 August 2017

  1. Only developers are allowed to post on the Stan 3 thread…

    These seem like big, breaking changes, especially if they don’t allow data/parameters blocks anymore. I recommend some kind of tool be released that easily let’s people convert their code, if you go this route.

    I strongly agree with Dan Simpson’s point about “real(0,) replacing real”. I’m not sure what this really achieves.

    I’m in favor of keeping blocks around. I should be able to write this
    data int N;
    data vector[N] x;
    data int(0,1)[N] y;
    as:
    data {
    int N;
    vector[N] x;
    int(0,1)[N] y;
    }
    and similarly in the other examples, so I don’t need to re-write most of my existing code.

    Honestly, I think there’s something nice about the blocks and would rather continue to use some of them, like the data block and the parameters blocks. They help make clear what’s going on.

    • My first reaction is to ask for what a more complicated model would look like, for example something that marginalizes discrete parameters. What does the HMM forward algorithm look like in this proposal? What does imputation of discrete values in generated quantities look like? I wonder how I could cleanly think about variable scope.

    • The proposal explains how it will achieve backward compatibility, so the old code will all still work.

      real(0, ) was to replace real<lower=0>. I am not happy with either.

      I think the settings are off. that should be commentable.

        • of course, should be [0,inf) but you could be a little relaxed there when using inf, I don’t think anyone will actually be confused by notation suggesting that an infinite interval is closed

        • It actually is the closed infinite interval in floating point. This was really tricky to think about with IEEE floating point, which adds plus and minus infinity. For floating point, the bounds would be open if we had arbitrary precision. A floating point number is converted from $latex (-\infty, \infty)$ to $latex (0, \infty)$ by applying the $latex \exp()$ function. Unfortunately, exponentiation overflows to infinity for even modestly large inputs (greater than 800 or so). We could rethink what we’re doing and reject—that usually happens anyway when the positive-constrained variable is plugged into something that really does need a positive number.

  2. Why people might like model blocks:

    They make it much easier to understand other people’s models, because it is clear where to look for different types of variables (and the statistical or generative model in the model block). I think it is a big disadvantage of the BUGS language that it is not clear where to look for what (sais someone has has spent little time with the bugs language).
    Quickly understanding other people’s models, e.g. those in the Stan manual, seems very important, because they are often the starting point for models tailored to ones own problems.
    Some commented along the lines that maybe the model blocks are nice just because one got used to them. My impression is that they are truly useful, also for those who are not used to them.

    • I’m glad you like them the way they are now. Your points are exactly why I defined them as I did in the first place. I found BUGS models very hard to read. The current Stan layout follows the inferential process from known data to inferred parameters.

      What I really feared initially is that everyone would hate having to declare what was data and what was a parameter. That is still problematic for turning models around into properly generative models.

      The problem we’ve been running into is that the model layout doesn’t follow the generative process. That’s tied up with why the declarations and definitions can’t be combined (the parameters haven’t been declared when you see the data). The other issue we have is that users are often overdefining things and putting them in the transformed parameter block when they should be in transformed data or generated quantities, and people have complained that learning all the block structures is too complex. I’m still not sure I’ll be able to solve the inference of transformed parameter vs. generated quantity without a huge amount of work.

      What I’m really looking for is a way to modularly definine things like non-centered parameterizations, which wind up getting spread all over a Stan program, though they’re really just about providing a prior on some collection of parameters.

  3. In terms of updating Stan, is there anything on the horizon for not needing to compile a Stan program? Playing around with a small models with only a few hundred units of data will take something like 20 seconds to compile and then 0.05 seconds to sample. If we had the opportunity to treat a Stan program in an intepreted rather than compiled way, so that sampling will be drastically slowed down, say, from 0.05 seconds to 1 second, then the round-trip would be quite an improvement (20.05 seconds -> 1 second). This would also ease installation pains around C++ compiler woes since Stan could be released in binary form.

    • That’s probably driven by the reliance on C++ more than anything. C++ is slow to compile. I imagine it would be a huge PITA to be able to compile Stan with some other language. It’s certainly annoying for some small programs, but the fact that it is in C++ is probably a huge time saver with any complex model or large amount of data.

      • C++ compiled with full optimization is what’s slow to compile. The culprit is primarily template metaprograms (not just generics), which includes our autodiff, densities, and all the linear algebra. The C++ preprocessors are really slow, and then optimization at level 3 takes a lot more time.

        You can turn down the compiler optimization level and cut the compile time in roughly half (it eliminates all the time spent on optimization). I always do this while developing in CmdStan, but I’m not even sure we provide users that option any more in RStan.

        It’s technically feasible, but would be a ton of work to build an interpreter for Stan.

        • Bob, I have looked a bit on the wiki and in the code etc, but there wasn’t an obvious place to find this info.

          Where’s a good place to look for how to get Stan to build you a library rather than an executable? Suppose i have a stan file foo.stan and I want to build a shared library that lets me evaluate lp and get the gradient… so I can call it say from Julia. Where do I look to find that kind of info? I assume this is what rstan is doing under the hood. I’d like to be able to do it too.

  4. I’m not terribly opposed to it; seems like it would clean up some of my models actually.

    Some questions (since the discourse thread is not commentable):

    1) Currently, things in parameters, trans. parameters, and gen. quantities are saved. Under this proposal, would anything with ‘param’ and anything in the ‘global scope’ be saved? So in functions, defining param real theta = log(whatever); return theta would save theta to output, correct? In essence, because the function is returning a *type*, param real, which is the same as just having param real theta declared/defined in the model code itself?

    2) If I wanted to define a useful constant, say real b = sqrt(2/pi()), I would now just use data real b = sqrt(2/pi()) ?

    3) If I wanted to compute a quantity, but didn’t want to save it, would I just scope it? E.g., {vector yhat = X*beta; y ~ normal(yhat, sigma)} ? That could get fairly irritating after a while, tbh. There are often multiple lines of calculations that I throw into the model block so that their outputs aren’t saved, but nevertheless used. I’d have to basically scope the entirety of the chunk that uses those calculations, correct?

    4) Will target += *_lp?f still be usable? E.g., target += normal_lpdf(y|yhat,sigma)? Some of my mixture models programmatically build those statements up, and the ~ syntax isn’t usable, whereas target += log_sum_exp() is perfect.

    5) Will this new syntax allow for vectorized bounds on parameters?

    6) Will this new syntax ease handling of missing data?

    • 1) I think we want to move to more flexible monitoring in the end. The original decision to save everything was from a feeling we should be monitoring convergence of everything. RStan (and I think only RStan) let you control what gets saved to memory (which helps, given that R stores the entire sample of thousands of draws across multiple chains in memory).

      2) No, you’d just write real b = sqrt(2 / pi());. Everything marked with data has to be read in externally. In other words, it’d work just like Stan currently does, which would put that variable declaration in transformed parameters (the new plan could infer that becuase there are no parameter dependencies on the right-hand side).

      3) Yes, just like now. And yes, everything that uses a variable needs to be in the scope of its definition. This whole arrangement of what gets saved and what goes where is what’s confusing users and what I was trying to make clearer (apparently not too successfully) with the new proposal.

      4) Yes, it would be.

      5) That’s independent of the new syntax. It’s on our to-do list for Stan itself, but I’d say 90% of the effort in Stan these days is going into training materials, interfaces, and new algorithms—not many hands available to add more features to the langauge itself. Basically, just me and Mitzi.

      6) Not that I can see.

  5. I like the more concise code. The overhead is noticeable for short programs, which makes rstanarm and brms so attractive. Something like yasnippet and stan-mode.el make for concise typing; as long as the expanded code isn’t much bigger, comprehension isn’t hurt, I think.

    I find the (0, ) notation a bit confusing. It does make me want to see options for [0, ) and (0, ), though (which you could almost get in the original with 0> (parse that!) and ). Which raises the question: is there good reason to have both [0,) and (0,)?

    One thing I mildly (but only mildly) like is getting rid of the in the constraints: when first seeing Stan programs, it was hard not to see those as less than and greater than. That isn’t a big concern; I’ve gotten used to it.

    Your first example looks a bit less structured than the current Stan–too easy to write old-fashioned BASIC (OFB?) code. Do you have any plans for blocks of some sort?

    As has been noted, the current blocks (inherited philosophically from MCSim?) do help clarify a program. It’s easy to explain to people that there are parameters and data, and it’s easy to explain the purpose of the various blocks.

    Your rewritten second version of a New Stan program brings back the basic ordering, but, as a reader, I have no clues as to the reason for that structure, and, as a writer, I have no reason to follow it. While I’m not a Python programmer, I gather that their formatting through indentation is forced, so every program will of necessity share the same basic structure. I could see doing something like that in Stan, but I’m not sure I’d prefer it, and I don’t see it in your proposal.

    My major comment has to do with a feature you haven’t proposed changing yet: the programming of ODEs. I used MCSim a bit before Stan was around, and I liked that ODEs seemed to be full-fledged entities, describable in the Dynamics {} (aka Model {}) block. That made more sense to me, for the description of the dynamic part of the model was integrated more naturally into the description of the rest of the model. Incidentally, I see MCSim now has a Jacobian {} block, which may be a nice reminder to think about whether one is needed.

    • No, the constraints are open for real numbers, but we can get the boundaries due to overflow and underflow. The reason we can’t use square brackets is that it’s ambiguous with array dimensions.

      Everyone seems to have missed the part of the proposal where I said we’d allow blocks to scope a bunch of variables to maintain backward compatibility. So yes, I was imagining that’d stay.

      Given that the overall reaction has been largely negative, I don’t think this proposal has much of a shot going forward. Andrew Gelman and Michael Betancourt have been silent (they’ve been lobbying for compound declaration and sampling statements and for stressing the generative story respectively).

      Python uses indentation rather than semicolons to indicate line endings, so yes, they’re required in that sense. But I’m not proposing Python-style syntax at all (because I don’t like it). R is on the fence—sometimes newlines indicate ends of expressions and sometimes they don’t.

      Not sure what you mean by making ODE dynamics more part of the language. The ODE systems are defined as functions in the language. I don’t know any simpler way to do it. Stan lets you have an arbitrary number of ODEs, so it wouldn’t make sense to have a single dynamics block.

      We will be introducing lambdas with binding eventually, which can make some of this easier. It’d look something like this, where variables like theta are bound in from the environment (static lexical scoping):

      ode_integrate({vector x}.[-theta * x[1], -beta * x[2] + x[1]]', ...);
      

      I should’ve asked Frederic about updates to MCSim when I saw him last week! I didn’t even know they were still developing it.

      • I don’t think the reaction has been *that* negative. Most of what I’ve seen has had people really liking big wodges of the proposal (getting rid of some of the blocks, modularising things like non-centering, declaring parameters in more natural places when necessary), just a bit wary of what looks like a serious reduction in the amount of structure.

        • I think most of the negativity is just that — The stan file structure provided… well, structure. It is structure at the expense of brevity and flexibility.

          Honestly, if the TEMPLATE of the proposed syntax file were just:

          /********
          * Data **
          ********/

          /*********
          * T. Data*/
          **********

          and so on, would anyone really have a major beef with it? :p

          What I mean is: People complain about the unstructuredness of the new syntax due to [optional] model blocks, but in reality, people can still just use it and teach it as though those things existed:
          Specify data, transform data, specify parameters, transform parameters, specify model, specify model-derived quantities. Add the comments in yourself, make your own structure, and it would be fine.

          There is a nice didacticity (is that a word? extent to which something is didactic?) to the stan file that helped me understand bayes/stan when starting out, but as you advance in stan and bayes, you realize pretty quickly that separation of knowns and unknowns into blocks isn’t theoretically meaningful, and you have to do all sorts of quirky tricks for optimal sampling and more advanced models.

        • The problem is that we really don’t want to split the community and have two distinct ways to write Stan programs. Saving the blocks for backwards compatibility is one thing, but continuing to teach it? I’d be opposed to that if we made the change.

          Former linguist hat on. By “is that a word?” people usually mean “is it in the dictionary?” Maybe, maybe not (I’m not even going to bother looking). But dictionaries don’t define wordhood, speakers do. Language is highly productive morphologically, even English, and not every “word” is going to be listed in the dictionary (even the OED). Here’s an old blog post listing dozens of words derived from “author”.

          The separation of knowns and unknowns is fundamental and not going away. In both cases, new and old syntax, parameters are the unknowns and data are the knowns, and everything else is a transform of either parameters or data or generated after the fact in the generated quantities block. Nobody’s proposing to replace that—it’s fundamental to Bayesian stats.

        • If the reaction isn’t overwhelmingly positive among our developers, no way this is going to happen. Just too much work.

          I took your (Dan’s) remarks to be largely negative.

          The main proposal is to lose the blocks, free up order, and let Stan infer more of the types. There’s also compound declare-sample, but that’s really independent of all this. As for compound declare-sample, Andrew and Michael, our two “loudest” team members, have staked out opposite sides of the debate (Andrew for [he suggested the feature in the first place] and Micahel against [he thinks it breaks some kind of abstraction which I don’t understand]).

          Overall, the whole project has gotten a whole lot more political, which is inevitable with at least a dozen core developers with varying notions of where the project should go.

        • Sorry Bob. Those damn years in Norway always make me sound more negative than I feel… I’d put myself in “I don’t quite understand the benefits of the proposal” camp (although I have a better idea now, but need some time to digest all the new information) rather than the “hard no” camp.

        • It’s the years on the planet making me more curmudgeonly. I used to be a happy-go-lucky youngster! Now I spend all my time fretting about Stan from the actual software and design side, through the politics to the funding, right down to install issues we can never seem to shake and persistent misconceptions we just can’t seem to explain well enough to avoid.

        • I’d also say the political aspects of it are kind of inevitable for this type of change. The problem is that as much as the Stan language was designed based on the core team’s views on how to specify models at that particular moment in time, it has also changed how people who use it think of their models and construct their models.

          So changing the language is, by proxy, going to change how people have to think of their models going forward. Which is the sort of thing that people have *a lot* of feelings about!

          The changes you proposed were fairly strong, but I think that’s a good thing (consulting the “stakeholders’ is really important, as is responding to concerns, but it’s just as important to avoid the “horse designed by committee” problem). I just think we all need to wrap our heads around what the changes you proposed actually mean. Especially now that we have a lot more information about “why they are as they are” and “where they came from” etc.

        • Politics is inevitable by definition as soon as more than one person is involved in a decision. I like Wikipedia’s lead:

          Politics (from Greek: Politiká: Politika, definition “affairs of the cities”) is the process of making decisions applying to all members of each group.

          I’m just not very good at it. And I don’t like it. Nobody ever tells you that whenver a project is mildly successful it becomes a management job (same way nobody ever tells you that being an academic is like running a small business and a small school at the same time).

          I really just wanted to run this by people. I’m not at all committed to these ideas myself. They just seemed to be a logical conclusion from some of the discussions I’ve had with Andrew particularly, and from what I’ve seen confusing people on our forums.

          The Stan langauge blocks for data, parameters, and model were designed by me when the core dev team was just me and Matt Hoffman working in a modest subversion repo that spanned what is now three repos: math, stan, and cmdstan. That’s the size project I’m most comfortable with.

          The derived quantities blocks (transformed data, transformed parameters) were added soon thereafter and the generated quantities block after that, and they all made it into version 1.0. The team grew pretty quickly initially, so Daniel Lee, Michael Betancourt, and Ben Goodrich were probably around for those discussions and Matt had probably already left. We didn’t use to list the team in the manual, and I don’t feel like fishing through Git for the history.

          I very much like getting feedback on what I’m going to do before I do it. Saves a lot of false starts that way.

          The motivation for the block structure was to delineate scope of what seemed like very different kinds of variables. Basically it was to make it easy for me to read a program and figure out what it was doing. I find BUGS programs challenging to read (though they are easy to write if all goes well) because they introduce a logic puzzle of type inference (is this variable a scalar or matrix—oh, there it is with a Wishart prior, it must be a matrix).

          I’m very aware of the danger of having everything designed by committee. I think it’s why almost every big budget movie is terrible. My fave story on this front is from Chris Manning about U. Sydney linguistics—nobody liked their curriculum, but they had three competing proposals. I think Chris said just about everyone thought all three of the new proposals were better than the existing proposal, but everyone was so dug in on their own version they couldn’t compromise, so the old curriculum just kept chugging along. I want to try to avoid that problem. That’s why we put one person in charge of each segment of Stan, so at least we’d have a way to break deadlocks.

      • +1 for compound declaration and sampling statetments, and I’m fairly agnostic to all the other changes. Writing larger models tends to have a large transformed parameters block and I have to copy-paste the parameters block down into the model block to make sure I’ve added priors to everything and then delete the pasted, especially after testing and expanding the model. If compound statements are definitely out, then how about an error for any parameters that do not have explicit priors set on them, with potentially somthing like “~ default();” or “~ improper();” for the default prior?

    • I didn’t answer all this the first time around.

      No, the current blocks weren’t inspired by MCSim. I didn’t even find out about MCSim until afer we wrote Stan.

      I think what looks natural for constraints depends on where you come from. Postfix angle brackets are used in C++ and Java for templates and generics respectively, so it seemed like a natural way to include qualifications. All of our real intervals are technically open, but numerical underflow or overflow can make them closed. The integer intervals are closed.

  6. I largely agree with Daniel, I think.

    As for what I meant by “making ODE dynamics more part of the language,” see https://www.gnu.org/software/mcsim/mcsim.html#Dynamics-section and the Dynamics {} section of https://www.gnu.org/software/mcsim/mcsim.html#perc_002emodel. The “dt()” operator defines the derivatives in line with the rest of the calculations. It’s probably just a matter of getting used to Stan’s approach, but I tend to see it as defining the model near the end in the model{} section and referring back to the ODE in the functions{} section. In MCSim, I see those both as a description of the model, and they occur in the same block. I think I’ve got to get some experience with the Stan approach.

    • Thanks for the link. I don’t think we’re going to be building “dt” into Stan’s syntax any time soon! For one thing, it seems to assume there’s only a single ODE in every program, but maybe there’s some way of scoping them (like multiple sets of states declarations).

Leave a Reply to Bob Carpenter Cancel reply

Your email address will not be published. Required fields are marked *