Transitioning to Stan

Posted on April 14, 2014 10:50 AM by Andrew

Kevin Cartier writes:

I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect).

My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort of point estimate). At that point, Stan is a winner compared to programming one’s own Monte Carlo algorithm.

We (the Stan team) should really prepare a document with a bunch of examples where Stan is a win, in one way or another. But of course preparing such a document takes work, which we’d rather spend on improving Stan (or on blogging…)

24 thoughts on “Transitioning to Stan”

Robert Grant on April 14, 2014 1:52 PM at 1:52 pm said:

I’m pretty much at the point of forgetting BUGS/JAGS language, having been using Stan exclusively for about 6 months. It’s not any harder to code and produces a game-changing speed-up in many settings, especially if you want to <a href="http://robertgrantstats.wordpress.com/2014/01/31/need-to-do-a-simulation-study-on-a-bayesian-model-use-stan/"do some kind of simulation study, where you can repeatedly run the compiled program for your model. There’s only enough space in my brain for a few languages, and R and Stata constantly jostle for space in data processing, regression models etc. Once my BUGS/JAGS cortex becomes run down, there’s little point in me trying to preserve it… so I would suggest taking the leap. rstan is a great interface too.

If you want a collection of use cases but don’t have the time, you could always organise a Stan conference!
- K? O'Rourke on April 14, 2014 2:47 PM at 2:47 pm said:
  
  Robert: The link is broken but then I noticed one of your posts I wanted to comment on it and here as the issue is often discussed here – _forcing_ things that need to be corrected – to actually be corrected.
  
  It is this link http://robertgrantstats.wordpress.com/2014/04/08/including-trials-reporting-medians-in-meta-analysis/ and specifically the Hozo et al paper you pointed to.
  
  In my thesis http://statmodeling.stat.columbia.edu/wp-content/uploads/2010/06/ThesisReprint.pdf I made these comments after communicating to one of the authors
  
  “In the given reference, the example purported to demonstrate the value of such a
  method that was based on using medians and ranges of within patient change scores, but these
  were actually unavailable to the authors. Instead the authors “imputed” values using information
  on group mean scores over various treatment applications on different days, even when considerable
  drop out occurred [private communication].
  Furthermore, the non-parametric conversions they used were based on inequalities between
  sample medians and ranges and SAMPLE means and standard deviations which are only indirectly
  relevant – it is population means and standard deviations that are of inferential relevance.”
  
  So authors don’t make corrections their papers (it was keep on misleading folks.
  
  As for how best to include trials that report incomplete stats (or just not the stats you want) in a meta-analysis these days, just do ABC as in this case the posterior from ABC is actually the one you want (you only have the summaries you have) and if you just want the likelihood divide the posterior by the prior.
  - Bill Jefferys on April 14, 2014 4:36 PM at 4:36 pm said:
    
    K?
    
    The correct link is here:
    
    http://robertgrantstats.wordpress.com/2014/01/31/need-to-do-a-simulation-study-on-a-bayesian-model-use-stan/
    
    There were several extra characters at the end of what Robert gave.
    
    Bill
    - Robert Grant on April 15, 2014 10:30 AM at 10:30 am said:
      
      Bill’s got the link right. I think I forgot to close the href tag with a > and then there’s no going back!
      
      Keith – I think you are right with the most general approach to use ABC to get the pseudo-data and hence the stats you do want, but speed and tuning worry me a bit. Also I think there are a number of scenarios that crop up a lot which could be pre-programmed. I’m happy to consider other options though. I haven’t ruled out ABC and I’d like to come back to it. If you’re going to JSM this year, look me up, I’ll be presenting on it and we can thrash out the different approaches.
      
      Regarding Hozo et al, that’s an interesting insight. I like their inequality approach but it doesn’t interest me hugely in practical terms because there’s still a lot of uncertainty floating around – how could there be any less!
anon on April 14, 2014 2:11 PM at 2:11 pm said:

I vote that the Stan team spend their time (1) making it easier to specify models with discrete parameters and (2)figuring out a way to handle missing data as super easily as JAGS. With those two changes, Stan would be a no-brainer first Bayesian model language to learn. Right now, I consider it a bit of a tossup.
- Andrew on April 15, 2014 6:25 AM at 6:25 am said:
  
  Finite mixture models: very soon.
  
  Discrete missing data: this will take a bit longer.
  - Carlos on April 15, 2014 10:24 AM at 10:24 am said:
    
    Andrew, is there some work being done related to spatial modeling using STAN?
    - Anonymous on April 15, 2014 12:50 PM at 12:50 pm said:
      
      there are some examples on the google discussion group.
      the 2cent wrap up is, using the built in distributions via ‘~’ is a little slow.
      to speed things up, you want to increment the log posterior manually
      this requires you to write down the formula and hard code it via stan’s syntax
    - Andrew on April 15, 2014 2:55 PM at 2:55 pm said:
      
      It should be easier to do this soon using functions, which Bob is currently implementing in R.
- Krzysztof Sakrejda on April 15, 2014 9:17 AM at 9:17 am said:
  
  I’ve been really hoping that the write-your-own posterior version of discrete missing data would become available soon but it does look like it will take a while as they are lacking a sufficiently interested party and the changes required are pretty deep (https://groups.google.com/forum/?fromgroups#!topic/stan-users/HL0TgFxlvLU). To me it looks like the workable alternative is to alternate Stan steps and discrete steps from R. That alternative is already workable but it should soon get much better as the Stan group does seem to be actively working on making it possible to restart a sampler without having to re-do the warm-up phase (https://groups.google.com/forum/?fromgroups#!topic/stan-dev/gnLLwEI4zN4). That alone would make it pretty fast.
- Bob Carpenter on April 15, 2014 2:03 PM at 2:03 pm said:
  
  No time frame for either (1) or (2).
  
  The first problem with (1) is that we don’t see a way to evaluate the probability of a discrete parameter in the posterior conditioned on everything else. We can do it by brute force evaluating the entire log probability function, but that would be very inefficient in most cases (though it wouldn’t need gradients). Options would include somehow marking up the relevant part of the model for the discrete parameter (something we’re thinking about in other contexts, too) or designing a language to provide the conditional, which would still be a lot of work compared to BUGS/JAGS.
  
  The second problem, which is related, is that we don’t know how to do the conjugacy analysis to do it efficiently. So we’d have to do something like slice sampling everywhere, I think.
  
  The third problem is that we don’t know how it interacts with HMC or how to adapt the slice sampler.
  
  Having said all of this about discrete parameters, marginalizing them out where possible leads to much more efficient sampling. I agree that it’s more complex on the modeling side. We’re going to try wrapping up popular use cases in functions, but we’re not sure how far we can get.
  
  As to missing data (2), that’s a whole different can of worms. We don’t have anything like N/A in the basic language — it’s just C++ doubles. So it’s more of an R concept than a Stan concept. But assuming we could pass in something like markers for missingness, there are issues with how to do the I/O and define the log probability function with the ad-hoc indexing based on run-time missingness. One alternative might be to define a more limited language that’d make it easier.
  - Phillip M. on April 16, 2014 1:20 AM at 1:20 am said:
    
    I’ve just been reading about Stan, and am looking to dive into its base code myself as this is a really interesting project. I see the scalability problem is nothing trivial here. And of course, being c++, you have invested a lot of developing time, though i would think performance should be rather spectacular. I’m surprised you’re hitting ceilings in the tens of thousands, though I don’t know what you’re benchmarking stan on and what the processing time even looks like.
    
    From what I could tell the work seems to be entirely done in memory – is that correct? What are your ideal goals about what hardward stan should be optimized for that weren’t evident on the stan landing site? Looks like a fun project.
John Mashey on April 14, 2014 4:54 PM at 4:54 pm said:

“We (the Stan team) *should* really prepare a document with a bunch of examples where Stan is a win, in one way or another. But of course preparing such a document takes work, which we’d rather spend on improving Stan (or on blogging…)”

Yes, because doing that generates more usage/feedback, which is often the most valuable single thing in helping guide improvements that are truly useful beyond the team. There were numerous examples of this inside Bell Labs in the 1970s, as per the Small is Beautiful talk here, but certainly things like C, awk and I think S, among others.
Rahul on April 15, 2014 2:30 AM at 2:30 am said:

The BUGS software homepage has this warning; what exactly does it mean? Can someone elaborate?

….MCMC is inherently less robust than analytic statistical methods.
- Dan Simpson on April 15, 2014 4:21 AM at 4:21 am said:
  
  It means that you’ve got to check carefully that your MCMC method has worked for your problem (convergence, mixing, etc). If you have analytic expressions for posteriors, you don’t have this problem (also semi-analytic results, like a posterior with a 1D integral that you need to compute numerically).
  
  It’s the boilerplate warning on any complicated numerical method for solving difficult problems. (And it’s nice to see it explicitly written down!)
  - Corey on April 15, 2014 8:19 AM at 8:19 am said:
    
    Hmm… what other warnings should be explicitly written down?
    
    “Newton’s method is inherently less robust than analytically solving for the root.”
    
    “Using Excel is inherently less robust than, well, a good lot of things.”
    
    “A black-box approach to statistics is inherently less robust than understanding WTF you’re doing.”
    - Robert Grant on April 15, 2014 10:39 AM at 10:39 am said:
      
      “Quitting Spreadsheets Now Greatly Reduces Serious Risks to Your Research”
Noah on April 15, 2014 11:51 PM at 11:51 pm said:

Any plans for Stan to handle an auto-regressive (time series) type model?
- Andrew on April 16, 2014 2:09 AM at 2:09 am said:
  
  Noah:
  
  Yes, we can already handle such models (at least some of them). Just write down the joint density and go with it. Soon we will have functions implemented and this will make it easier to use such models as building blocks in more complicated systems. But you can already specify an AR model, either by directly writing y[t] given y[t-1] in a loop, or by constructing the covariance matrix and writing the multivariate normal.
  - Chen on October 14, 2014 6:25 AM at 6:25 am said:
    
    But I have no idea how to code when there are missing values in time series… Is it possible? I think I need to specify missing values as parameters, but then how can I keep the serial nature of the time-series data???
    - Andrew on October 14, 2014 6:51 AM at 6:51 am said:
      
      Chen:
      
      This is discussed in the chapter on “Missing Data & Partially Known Parameters” in the Stan manual (that’s chapter 7 in the manual for Stan Version 2.4.0).
    - Chen on October 14, 2014 7:49 AM at 7:49 am said:
      
      Ah, thank you very much, Andrew! OK, I am going to create a mixed vector that contains known and unknown values in a transformed parameters block, and use it in time series analysis.
Rob on April 16, 2014 7:54 PM at 7:54 pm said:

Is there a good tutorial specifically for installing Stan and RStan for a Windows PC? I currently use use gcc etc in Cygwin to compile source code, but I doubt that this approach will work for Rtools and Stan. My perception that the installation process for Stan on a PC being a convoluted one is what keeps me using JAGS for the time being. (My work computer is a multicore PC which is powerful enough for most things I wish to do, and is faster than the cluster machine running Linux which I tend to use only when memory becomes a serious issue.)
- K? O'Rourke on April 17, 2014 11:14 AM at 11:14 am said:
  
  Rob:
  
  I just installed Stan on a new PC about a week ago and it went much smother than a year or so ago which involved a few emails to Stan group.
  
  Suggest you read the online install documents and if somethings goes wrong, email for assistance.

Comments are closed.