Not sure what you mean by side effects, but I think `apply(iris, 1, print)` is a sufficient counter-example to demonstrate that apply does allow side effects. Even more clear is to consider what `apply(iris, 1, function (x) {assign(“foo”, x, envir = .GlobalEnv)})` would do.

]]>I know just enough R to do what I need to do, and no more. Every once in awhile I learn some more R because I need to learn it to do something that I couldn’t do before.

I guess that my attitude toward R is similar to many scientists’ attitude toward statistics!

]]>Many of the R classes I have been part of have stated at some point that lapply/sapply (or more generally the *apply family) are more efficient than for loops. Simple examples demonstrate that this is not the case (at least, not in any meaningful sense). This is an unfortunate pattern because there ARE very good reasons to avoid the usage of for loops when *apply will suffice. Specifically, lapply/sapply are substantially more readable and do not allow side effects.

The correct statement one can make from an efficiency standpoint is to avoid loops (including the *apply family) when “Vectorized” code will do. This has more to do with the overhead in calling C/Fortran code (as all R code eventually does). http://www.noamross.net/blog/2014/4/16/vectorization-in-r–why.html seems to do a good job summarizing the content/ideas around Vectorization.

]]>Most code using *apply functions is no faster than loops and I believe some include loops internally.

So that aspect of R code basically avoids loops for aesthetic reasons and avoiding side effects.

]]>This will help you get some intuition about where in your code to look for speed up opportunities. If a for loop does a 100 iterations a second, there is probably not a lot of overhead wasted in the looping itself. However, if you are iterating 100,000 times a second, you probably have a prime candidate to try to push that loop down into the C code through either vectorization or a foreign language interface (e.g. Rcpp).

]]>Stan is like C++ in that loops are compiled down to very efficient code. This means intuitions developed from interpreted languages (e.g., R, Python, BUGS, JAGS) will lead users to think they need to vectorize in these situations.

Stan is like interpreted languages in that the automatic differentiation is more efficient when rolled into vectorized calls, especially if expressions can be reduced to vectorized log density functions or matrix arithmetic. The reason isn’t that loops are slow per se. Rather, looped expressions involving derivatives create a large expression graph as the log density is evaluated. The reverse pass of reverse-mode autodiff then walks over that graph from the result down to the parameters propagating the chain rule. That code is essentially interpreted code, albeit more efficient interpreted code than you find in R because of much tighter static data typing. Vectorized expressions further allow us to avoid recomputing. For example, an evaluation of `normal_lpdf(y | mu, sigma)`

will only evaluate `log(sigma)`

once, even if `y`

is a vector. The other efficiency boost from pushing things into matrix arithmetic comes from a combination of compound matrix derivatives (Stan extracts underlying values and operates on those using Eigen’s efficient matrix lib, which is itself much faster than looping even for pure `double`

code).

A further twist is that there are also statistical efficiency concerns with sampling. There is some latitude in parameterizations, priors, etc., when defining a model, that can have a big impact on the statistical efficiency (mixing) of the sampler (or optimizer).

There’s a chapter in the manual on efficiency that covers both types in more detail.

]]>Trying to avoid loops is about a change of thinking about how to write code.

Loops involve some amount of housekeeping, e.g., the loop variable.

Having to use a variable for indexing increases the risk of making a mistake (different kinds of mistakes can be made with the alternative approaches, but I think they are less likely, at least for the kind of code I write).

I found that avoiding loops completely involved lots of head scratching to start with, but over time the benefits have made the investment worthwhile.

http://shape-of-code.coding-guidelines.com/2012/07/27/my-no-loops-in-r-hair-shirt/

On the speed note, I’d just add an aside, that for loops and vectorized formulations (e.g. *apply) are similar in performance, so long as the vectorize computation is happening in R. This is really the key in optimizing R code. What you want to do is find a way to express your computation so that the performance critical part of it is being evaluated in C.

Consider the below. The first expression is vectorized, but actually slower than the for loop. One still might prefer the first expression to the second based on the conciseness of the code. On the other hand, the third expression vectorize the code and executes it down in C, which in general will net you about a ~100 times speed up.

system.time({v <- replicate(1e6, function(x) rnorm(1))})

user system elapsed

2.189 0.053 2.242

system.time({

v <- rep(NA,1e6)

for (n in 1:1e6) v[n] <- rnorm(1)

})

user system elapsed

1.725 0.391 2.115

system.time({v <- rnorm(1e6)})

user system elapsed

0.062 0.001 0.063

N <- 1e5

rep(N, 1) # 1e5

rep(1, N) # 1 1 1 1 1 1 1 1 1 [etc]

> microbenchmark(a <- rep(1e5, 1), for (n in 1:1e5) a[n] <- 1)

Unit: nanoseconds

expr min lq mean median uq max neval cld

a <- rep(1e+05, 1) 303 541 3714.97 2356 6187.5 27574 100 a

for (n in 1:1e+05) a[n] <- 1 5587133 7129368 25695275.88 14099214 40116061.5 114521706 100 b

…some stuff…

t2=Sys.time()

t2-t1

you can directly subtract them to get time differences

]]>Here is the source for rnorm, it is a wrapper function that calls some compiled code called C_rnorm:

rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)

You can see that function is in the stats library and there should be a C function somewhere called “rnorm”

> stats:::C_rnorm$name

[1] “rnorm”

Here it is, without any argument for number of results to return (“n”):

double rnorm(double mu, double sigma)

{

if (ISNAN(mu) || !R_FINITE(sigma) || sigma < 0.)

ML_ERR_return_NAN;

if (sigma == 0. || !R_FINITE(mu))

return mu; /* includes mu = +/- Inf with finite sigma */

else

return mu + sigma * norm_rand();

}

https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/nmath/rnorm.c

Then that function is calling norm_rand which also looks like it returns one result at a time:

https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/nmath/snorm.c

It seems norm_rand includes a couple different algorithms, we can see what is being used with RNGkind:

> RNGkind()[2]

[1] “Inversion”

So we want to check the “Inversion” method:

case INVERSION:

#define BIG 134217728 /* 2^27 */

/* unif_rand() alone is not of high enough precision */

u1 = unif_rand();

u1 = (int)(BIG*u1) + unif_rand();

return qnorm5(u1/BIG, 0.0, 1.0, 1, 0);

https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/nmath/snorm.c

To see what that is doing we can look at unif_rand and qnorm5 which also look like they deal with one value at a time:

https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/nmath/standalone/sunif.c

https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/nmath/qnorm.c

I really expected to find some loops in the source code, but there are none…

]]>