Speed of Simulations

1 Speed of simulations
2 Benchmarking the speed of the methods
3 Summary

1 Speed of simulations

Because simulations can take a long time, they provide good examples for comparing speed of R code and developing good programming techniques.

1.1 Loops versus apply versus …

We will compare different methods of simulating 100,000 realizations of the sample mean from samples of size 9 from an exponential distribution with rate, λ = 1.

N <- 100000
n <- 9

For reference, the method suggested in Simulation studies using R is

set.seed(123)
system.time(resRepl <- replicate(N, mean(rexp(n))))


   user  system elapsed 
  3.630   0.000   3.671

We see that on the machine I am using it takes a few seconds to produce the result, which is

str(resRepl)

 num [1:100000] 0.704 0.584 1.505 1.083 1.489 ...

1.1.1 Simulating in a loop

Programmers who have used compiled languages like C, C++, Fortran, or Pascal, and even byte-compiled languages like Java or C# can quickly adapt to using for loops in R. The syntax in R is a bit different from those languages but the ideas of looping are familiar.

Part of the folklore regarding R programming is that you should avoid writing for loops but the situation is more subtle than that. You should avoid writing for loops carelessly.

If you are going to produce a vector or a matrix (or, in general, an array) and you preallocate the structure then use replacement operations, looping is not bad.

set.seed(123)
resLoop1 <- numeric(N)        # preallocate the result of correct size
system.time(for (i in 1:N) resLoop1[i] <- mean(rexp(n)))


>    user  system elapsed 
  3.880   0.000   3.904

We should, of course, check that the results are as expected

all.equal(resRepl, resLoop1)

[1] TRUE

The thing not to do in a loop is to grow the structure. R allows you to assign a value beyond the end of a vector and it will automatically extend the size of the vector when this happens. However, this operation is not without cost.

set.seed(123)
resLoop2 <- numeric()                   # empty numeric vector
system.time(for (i in 1:N) resLoop2[i] <- mean(rexp(n)))


>    user  system elapsed 
 42.480   0.050  43.018

This produces the same results

all.equal(resLoop2, resRepl)

[1] TRUE

but takes much longer.

Another idiom used by some is to initialize the result to NULL and concatenate each freshly simulated result onto the existing results. Again, this involves a considerable amount of recopying of results and is slow.

set.seed(123)
resLoop3 <- NULL                        # empty list
system.time(for (i in 1:N) resLoop3 <- c(resLoop3, mean(rexp(n))))


>    user  system elapsed 
 40.930   0.060  41.073

str(resLoop3)

 num [1:100000] 0.704 0.584 1.505 1.083 1.489 ...

1.2 Using arrays

At one time it was important to take into consideration the overhead of calls to R functions and avoid putting too many functions inside a loop. That is less of an urgent consideration on modern computers. Although the interpreter overhead is still present on modern computers it is not as big an issue.

Nevertheless it is good to see how such a simulation could be done by creating all the random numbers from the exponential distribution in one call and then reducing each subsample. We arrange the n*N values into an n× N matrix. (We could make it N× n but there is a slight advantage in having subsamples in columns rather than rows.)

set.seed(123)
str(MM <- matrix(rexp(n*N), nr = n))


 num [1:9, 1:100000] 0.8435 0.5766 1.3291 0.0316 0.0562 ...

1.2.1 Reducing the array using the `colMeans` function

At this point we can make use of a very fast, built-in function called colMeans to evaluate the means of the columns. So that timing comparisons with other methods are fair, we regenerate the matrix before taking the column means.

set.seed(123)
system.time(resColMeans <- colMeans(matrix(rexp(n*N), nr = n)))


   user  system elapsed 
  0.150   0.000   0.143

As we can see, this method is the clear winner for speed and we can check with

all.equal(resColMeans, resRepl)

[1] TRUE

to see that it does produce the same set of simulated values. (This, by the way, is another reason to use put subsamples in the columns and to use colMeans to reduce the matrix. If the matrix had been configured as N × n and reduced by rowMeans then the results would not correspond to those from replicate.)

1.2.2 Reducing the array using `apply`

The colMeans function is quite fast but only useful if you want the sample means from each of the subsamples. If we wanted the sample median instead of the sample mean, we would be stuck.

The apply function is a more general way of reducing an array because it takes the name of the R function to apply. Of course, this means there will be repeated calls to the R function being applied, with the overhead of those calls, but that is the price to be paid for generality. The call is of the form

resApply <- apply(MM, 2, mean)
all.equal(resApply, resRepl)


[1] TRUE

The value 2 for the second argument indicates that the mean function should be applied to the 2nd dimension (i.e. the columns) of the array.

To get a fair comparative timing we generate the random numbers within the call to apply

set.seed(123)
system.time(resApply <- apply(matrix(rexp(n * N), nr = n), 2, mean))


   user  system elapsed 
  4.000   0.000   4.058

We can see that this method's speed is comparable to the replicate method.

1.3 Using `sapply` , `lapply` and `vapply`

The apply function is one of a family of functions that apply other functions to the elements of some structure. The most general of these is lapply, which applies a function to the elements of a list (or any other vector structure, including numeric vectors).

A characteristic of lapply is that it always returns a list. The sapply function is like lapply except that it tries to "simplify" the list that is returned.

In our case we just want to repeat the operation of simulating n random values and determining their mean and do that N times so the function we want to apply doesn't use its argument. Nevertheless, we must give it an argument, even though we don't use it, so that the sapply function stays happy.

set.seed(123)
system.time(resSapply <- sapply(1:N, function(i) mean(rexp(n))))


   user  system elapsed 
  3.580   0.000   3.665

An alternative is to unlist the result of lapply.

set.seed(123)
str(unlist(lapply(1:N, function(i) mean(rexp(n)))))


 num [1:100000] 0.704 0.584 1.505 1.083 1.489 ...

for which the timing is

system.time(unlist(lapply(1:N, function(i) mean(rexp(n)))))

   user  system elapsed 
  3.430   0.000   3.428

A slight variant on sapply is vapply which can be used if you know you will be creating a vector. It takes a third argument which is a template vector that indicates the desired type of the response. It doesn't need to be as large are the response, it just needs to be the right type of vector (technically, I should say "mode" instead of "type" but you get the idea). Typical values are 1 for a numeric vector, 1L for an integer vector, and the empty string, "", for character strings but more complex structures can be used. See the examples in the help page ?vapply

set.seed(123)
str(vapply(1:N, function(i) mean(rexp(n)), 1))


 num [1:100000] 0.704 0.584 1.505 1.083 1.489 ...

system.time(vapply(1:N, function(i) mean(rexp(n)), 1))

   user  system elapsed 
  3.230   0.000   3.253

Notice that the timings for the sapply and lapply methods are similar to those for replicate. This is not a coincidence. The replicate function

replicate

function (n, expr, simplify = TRUE) 
sapply(integer(n), eval.parent(substitute(function(...) expr)), 
    simplify = simplify)
<environment: namespace:base>

ends up being a call to sapply. There are a few subtleties in there about setting up the counter vector and changing the expression into an anonymous function but basically it comes down to a call to sapply.

We won't bother discussing how the expression is converted to an anonymous function but it is interesting to consider why the number of replications, what we call N but is called n here, is converted to a vector of length n by integer(n), whereas we used 1:N. The integer(n) call creates an integer vector of length n filled with zeros, whereas 1:N creates an integer vector of length N counting from 1 to N — most of the time. Because we are not actually using the values it doesn't matter what the contents of the vector are as long as it has the correct length.

As for that "most of the time" comment, the exception is when n=0. The construction 1:0 produces a vector of length 2 whereas integer(0) produces a vector of length 0. It may seem bizarre to consider what the result should be when you replicate evaluation of an expression 0 times but the defensive programmer is always cautious of the "edge cases". So the construction 1:n is dangerous because it doesn't behave as expected when n=0. If you want to get an integer vector containing the values from 1 up to n with correct behaviour for n=0 use seq_len(n)

2 Benchmarking the speed of the methods

We have used system.time to assess the execution time of a single evaluation of a sample by each of the proposed methods. There are many different factors that can affect the overall execution time and we really should replicate the timings to get a better handle on the overall speed. The rbenchmark package provides a versatile function, called benchmark, to replicate timings of expressions and create a table of results.

First we load the package

library(rbenchmark)

(if this produces an error you may need to install the package first).

We should decide which methods we wish to compare and what size of samples to use. The replicate, sapply, etc. methods have taken 3 to 4 seconds for 100,000 realizations on this computer. To have the benchmark test run in a reasonable length of time we will cut the number of realizations to 1000 and run 100 replications of each method. As for the methods, we will eliminate the methods based on loops without preallocation of the results, as they are clearly not competitive.

To make identification easier, we create functions for each of the simulation methods

fRepl <- function(n, N)
    replicate(N, mean(rexp(n)))
fLoopPre <- function(n, N) {
    ans <- numeric(N)
    for(i in seq_len(N)) ans[i] <- mean(rexp(n))
    ans
}
fColMeans <- function(n, N)
    colMeans(matrix(rexp(n * N), nr=n))
fApply <- function(n, N)
    apply(matrix(rexp(n * N), nr=n), 2, mean)
fSapply <- function(n, N)
    sapply(integer(N), function(...) mean(rexp(n)))
fLapply <- function(n, N)
    unlist(lapply(integer(N), function(...) mean(rexp(n))))
fVapply <- function(n, N)
    vapply(integer(N), function(...) mean(rexp(n)), 1)
N <- 1000

benchmark(fColMeans(n,N),
          fVapply(n, N),
          fRepl(n, N),
          fLoopPre(n, N),
          fApply(n, N),
          fSapply(n, N),
          fLapply(n, N),
          columns = c("test", "elapsed", "relative", "user.self", "sys.self"),
          order = "elapsed")


+ + + + + + +              test elapsed relative user.self sys.self
1 fColMeans(n, N)   0.135  1.00000      0.13        0
2   fVapply(n, N)   3.458 25.61481      3.40        0
7   fLapply(n, N)   3.460 25.62963      3.44        0
6   fSapply(n, N)   3.619 26.80741      3.55        0
3     fRepl(n, N)   3.628 26.87407      3.57        0
5    fApply(n, N)   3.853 28.54074      3.80        0
4  fLoopPre(n, N)   3.876 28.71111      3.79        0

So we see that there is very little difference between the various looping or "apply"ing methods and, on this machine, the colMeans function is about 20 to 30 times faster than using a loop or an apply function.

3 Summary

Using replicate for simulations is conceptually the simplest approach and is competitive with other methods based on looping or various functions from the apply family.
For the particular case of evaluating sample means, the colMeans function can be an order of magnitude faster. It is worth remembering for that one special, but frequent, case.
Using for loops is not, by itself, a bad practice. But you should avoid "growing", within the loop, the object containing the result.

Author: Douglas Bates

Date: 2010-08-25 Wed

HTML generated by org-mode 7.01h in emacs 23