4 For loops - can they be parallelized?

Let’s go back to your example with:

xs <- list(1:25, 26:50, 51:75, 76:100)

where we used:

ys <- lapply(xs, slow_sum)

to calculate the the slow_sum() of each element in x.

Someone, who is not familiar with R and its lapply(), might choose to solve this problem using a for-loop, e.g.

ys <- list()
for (ii in 1:length(xs)) {
  ys[[ii]] <- slow_sum(xs[[ii]])
}

By the way, can you spot the potential problem with this solution?

When using 1:length(xs), there is a risk it becomes 1:0 (= c(1, 0)), which happens if xs is an empty list. To avoid this, it’s safer to use seq_len(length(xs)), or the short cut seq_along(xs). Those will return an empty index vector, if length(xs) == 0.

4.1 Parallelize a for-loop using futures

Since we are already familiar with the future assignment operator (%<-%), we can parallelize the above for-loop as follows:

library(future)
library(listenv)
plan(multisession)

ys <- listenv()
for (ii in seq_along(xs)) {
  ys[[ii]] %<-% slow_sum(xs[[ii]])
}
ys <- as.list(ys)

Without going into the technical details, we cannot use a list for ys when doing this. Instead, we need to use a list environment part of the listenv package. Then, at the very end, we coerce it to a regular list using as.list(). This gives identical results, but the for loop is run in parallel.

This works, but if length(xs) is large, it is not as efficient as parallel map-reduce solutions such as future_lapply() and future_lap().

4.2 Replace a for-loop with a map-reduce call

Regardless of sequential or parallel processing, I recommend to always strive toward using map-reduce functions instead of for-loop. It is not always possible to do so, but in many cases it is.

If we look at the above for-loop, we see that each iteration is independent of the others, e.g. the result from the second iteration is independent of the result obtained in the first iteration, i.e. the value of ys[[2]] depends only on xs[[2]] and the function slow_sum(), but not on ys[[1]] or any other xs elements. We will return to this later, but when a for-loop algorithm has this property, we can turn it into a map-reduce call, and we already know that we can parallelize those.

As a rule of thumb, the first step towards replacing a for-loop with a map-reduce call, is to get rid of the auxiliary index variable, which in our example is ii. In our example, all we use ii for is to extract element xs[[ii]] and then assign the result to ys[[ii]]. We can make this explicit by rewriting the for-loop as:

ys <- list()
for (ii in seq_along(xs)) {
  x <- xs[[ii]]
  y <- slow_sum(x)
  ys[[ii]] <- y
}

This form helps us see how ii is used and that there is no dependencies between iterations.

Next, we can replace the above iterate-over-indices approach with an iterate-over-elements approach, as in:

ys <- list()
for (x in xs) {
  ys <- c(ys, slow_sum(x))
}

Note how we got rid of the iteration index ii. This might look like syntactic sugar, but it’s an important move as we will see soon. First, by getting rid of the index variable, the code becomes “cleaner”. By “cleaner” we often mean it is more readable (to the used reader). Second, cleaner code is often also less error prone. For example, without the index variable, there is no risk we’re making mistakes using it, e.g. we might use ii + 1 when it should be ii.

The above code reads as “for each element x in the list xs, do …”. Note how similar this is to “for each element x in the list xs, apply function …`, which is how a map-reduce call reads.

Since we got rid of ii, we can now replace the latter for-loop with an lapply() call as in:

ys <- lapply(xs, slow_sum)

Digest that for a while. Note how concise and clear that is compared to the more explicit for-loop that we started out with:

ys <- list()
for (ii in 1:length(xs)) {
  ys[[ii]] <- slow_sum(xs[[ii]])
}

One way to think about it is that:

an lapply() call communicates what is done, whereas
a for-loop communicates how it is done.

After working with R for a while, and seeing a lot of lapply() calls, you will get used to this form and prefer to to the more verbose syntax that comes from using a for-loop. I’ve worked with R for more than 20 years and to me an lapply() call is much easier to read and brings much less mental load compared to a for-loop. It is only if I try to think about what happens under the hood of lapply(), that it can become overwhelming. I can imagine there is some kind of for loop running inside, but if you think about it, it does not matter how it is implemented internally.

Finally, we now have a lapply() call, and we already know how to parallelize that; all we have to do is replace lapply() with future_lapply().

Conclusion: To parallelize a for-loop, convert it first to a map-reduce call, and then replace that with the corresponding future_ version.

4.3 How do I know if my for-loop can be rewritten as a map-reduce call?

Not all for-loops can be parallelized. We already touched upon one rule of thumb above. If the different iterations are independent of each other, we can get rid of iteration index variable (ii) and then we can basically always parallelize the code.

Another “smell test” for being able to parallelize a for-loop comes from asking ourselves the following question:

Q. Does it matter in what order we “process” the different elements in xs, i.e. in what order we call slow_sum() on each of them?

If the answer is “No, it doesn’t matter”, then we can parallelize the for loop.

For example, if we go back to our original for-loop;

ys <- list()
for (ii in 1:length(xs)) {
  ys[[ii]] <- slow_sum(xs[[ii]])
}

we note that we could iterate over the elements in xs in reverse order:

ys <- list()
for (ii in length(xs):1) {
  ys[[ii]] <- slow_sum(xs[[ii]])
}

and still get the identical result. We could even iterate over the elements in a random order, e.g.

ys <- list()
for (ii in sample(1:length(xs), replace = FALSE)) {
  ys[[ii]] <- slow_sum(xs[[ii]])
}

In all three cases, ys holds identical values in the same order.

By the way, a similar smell test for an lapply() call is to check if we get the same results if we reverse the input, and the output, as in:

ys_r <- lapply(rev(xs), slow_sum)
ys <- rev(ys_r)

If you think about it, lapply() could very well be implemented such that it process the elements in reverse order. If you read the documentation, help("lapply"), there is nothing saying in what order the elements is processed. This is intentional, because we should not use this function under the assumption that elements are process in a certain order. This is what future_lapply() and friends rely on and why it is valid to parallelize map-reduce calls.

Conclusion: If we have an algorithm that allows us to reverse the order of the processing, or process the elements in a random order, we can most likely also run the iterations concurrently.

4.3.1 Not all algorithms can be parallelized

It is only for some algorithms that the order of the processing does not matter. We referred to such algorithms as embarrassingly parallelizable algorithms. As the term suggests, the algorithm can be parallelized, i.e. it is valid to process the elements concurrently. For other algorithms, it is essential that the elements are processed in a given order. It is often that the algorithm is such that the input data in one iteration depends on the output of the previous iteration. We cannot use parallelization for such algorithms.

Q: What about the following algorithm - can it be reversed?

ys <- list()
y <- 0
for (ii in 2:length(xs)) {
  x <- xs[[ii]]
  y <- ys[[ii - 1]]
  ys[[ii]] <- slow_sum(x + y)
}

Without going into details, another criteria is that slow_sum() does not have, so called, side effects. This is now mathematical functions work, and this is one reason why we refer to R as a functional language. As a rule of thumb, most function calls in R does not have side effects, and if they do, you often already know it.