One simple trick to make Stan run faster

Posted on March 3, 2015 9:02 AM by Andrew

Did you know that Stan automatically runs in parallel (and caches compiled models) from R if you do this:

~~source(“http://mc-stan.org/rstan/stan.R”)~~

P.S. This capability is automatically in the current version of rstan which you can load in from Cran.

19 thoughts on “One simple trick to make Stan run faster”

Rahul on March 3, 2015 9:12 AM at 9:12 am said:

Can someone elaborate as to why the trick works? Sounds like an R idiosyncracy.

Reply ↓
- Andrew on March 3, 2015 9:45 AM at 9:45 am said:
  
  I’ve been told that my laptop has 4 processors. Ben’s code automatically sends one chain to each processor.
  
  Reply ↓
  - Rahul on March 3, 2015 9:52 AM at 9:52 am said:
    
    Interesting. So is that the equivalent of xargs -P 4 in bash?
    
    I guess the risk is ending up with a totally unresponsive GUI? Otherwise why not embed this trick in a default wrapper?
    
    Reply ↓
John Hall on March 3, 2015 9:46 AM at 9:46 am said:

@Rahul the source command loads in the R code. It’s used when you keep functions in separate documents. In this case, the function is located at an email address. From what I can tell, the function is basically a version of the stan function set up to always use parallel. Not sure how well it would work on windows machines.

Reply ↓
- Rahul on March 3, 2015 9:53 AM at 9:53 am said:
  
  Thx. So then source sounds the equivalent of standard Linux shell’s source command?
  
  Reply ↓
  - Corey on March 3, 2015 10:21 AM at 10:21 am said:
    
    Yep. There are a few of R’s commands that mimic shell commands; for example, R’s “ls” function lists objects in the workspace much as the shell command list files in the local directory.
    
    Reply ↓
- Corey on March 3, 2015 10:26 AM at 10:26 am said:
  
  This version of the stan function uses “mclapply”, and on Windows, mclapply just calls (the standard non-parallelized function) “lapply” (unless you try to set the mc.cores argument to a value greater than one, in which case it throws an error).
  
  (That address is a URL, not an email address…)
  
  Reply ↓
gwern on March 3, 2015 11:30 AM at 11:30 am said:

If it’s so useful, why isn’t Stan doing this by default? Could use https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/detectCores.html to detect at runtime the amount of parallelism available.

Reply ↓
- Andrew on March 3, 2015 11:57 AM at 11:57 am said:
  
  Gwern:
  
  Stan is doing it by default! You just source that code to set up the stan() function. The only issue is it can’t be inside the rstan package because of Cran restrictions.
  
  Reply ↓
  - David J. Harris on March 3, 2015 4:44 PM at 4:44 pm said:
    
    What aspect of this would violate CRAN policies? I’ve seen lots of packages that “suggest” the `parallel` package and only use it if available.
    
    Are the CRAN violations Windows-specific? If not, could the parallel functionality still be added for Mac and Linux users?
    
    Thanks
    
    Reply ↓
Daniel Lakeland on March 3, 2015 4:08 PM at 4:08 pm said:

I’d like to point out that this is a TERRIBLE way to get the functionality. Specifically, there could be anything in that .R file on the server, so for example it might contain code to maliciously delete everything in your home directory, or whatever.

Even if today there’s nothing wrong with the code, tomorrow some script-kiddie could find a vulnerability and replace that code on the server with their malicious file deleting, or personal data collecting alternative code.

Never source something from a URL, go and get the contents, put it in your own directory, verify that it seems reasonable, and then source your local copy!

Reply ↓
- Rahul on March 4, 2015 10:28 AM at 10:28 am said:
  
  It is interesting that R actually allows a source cmd to execute with a remote file over the net.
  
  I wonder if a Linux shell’s source command will source a remote script.sh over the net. I’ve never tried it.
  
  Reply ↓
  - martin on March 4, 2015 10:51 AM at 10:51 am said:
    
    Yes, the shell will allow one to do all sorts of things. I’ve seen people using curl to fetch some script and pipe it directly to sh as a means of running installers. Not recommended.
    
    Reply ↓
Ben Goodrich on March 3, 2015 6:28 PM at 6:28 pm said:

The parallel thing doesn’t violate CRAN restrictions; the other part of that function which silently writes the compiled model to the disk does. However, we have been given permission to write if the user specifies a non-default value for one of the options(). We might implement some options this week to facilitate doing chains in parallel too, but the logic of that function is essentially only right for a person working on a multicore laptop with lots of RAM relative to the model, which happens to be the only way Andrew uses Stan. To force it to always run that way would make it not work on clusters or for people with little RAM relative to the model.

Reply ↓
- Hadley Wickham on March 4, 2015 5:25 PM at 5:25 pm said:
  
  You should look into how Rcpp::cppFunction() works – it seems like the same problem.
  
  Reply ↓
Maciej on March 5, 2015 3:30 AM at 3:30 am said:

Did you removed the code from the server? Link redirect to stan webpage.

Reply ↓
- Andrew on March 6, 2015 7:14 PM at 7:14 pm said:
  
  It’s back.
  
  Reply ↓
Chris on September 22, 2015 9:06 PM at 9:06 pm said:

The link appears to be have been removed since the website got a makeover?
Any idea where I can find this again?

Reply ↓
- Andrew on September 22, 2015 9:25 PM at 9:25 pm said:
  
  Chris:
  
  See P.S.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

One simple trick to make Stan run faster

19 thoughts on “One simple trick to make Stan run faster”

Leave a Reply Cancel reply