8  foreach() is not a for-loop

For-loops are special in the way they can assign values to objects outside of the for-loop. For example,

xs <- list()
ys <- list()
last_idx <- 0
for (idx in 1:3) {
  xs[[idx]] <- letters[idx]
  ys[[idx]] <- LETTERS[idx]
  last_idx <- idx
}

assigns to both xs and ys. We also see that last_idx is updated in every iteration, and, when the for loop completes, it holds:

last_idx
[1] 3

In contrast, we cannot do the same for map-reduce calls, such as lapply(), because they return results, but cannot assign outside.

8.1 Super assignment (<<-) is not a solution

Warning, using “super” assignments (<<-), as in:

xs <- list()
ys <- list()
last_idx <- 0
void <- lapply(1:3, function(idx) {
  xs[[idx]] <<- letters[idx]
  ys[[idx]] <<- LETTERS[idx]
  last_idx <<- idx
})

or, similarly, assign(..., envir = parent.frame()), is considered a bad practise for many reasons. Please, do not use such hacks! (they will come and bite you if you try - trust me).

Previously, I said that any lapply() call can be replaced with a future_lapply() such that it can run in parallel. What would happen if we would go ahead and use the above <<- hack? Let us try:

library(future.apply)
plan(multisession)

xs <- list()
ys <- list()
last_idx <- 0
void <- future_lapply(1:3, function(idx) {
  xs[[idx]] <<- letters[idx]
  ys[[idx]] <<- LETTERS[idx]
  last_idx <<- idx
})

If we check xs, ys, and last_idx afterward;

str(xs)
 list()
str(ys)
 list()
last_idx
[1] 0

we find that they are empty and zero.

Q. Why is that?

The reason is that the expressions:

xs[[idx]] <<- letters[idx]
ys[[idx]] <<- LETTERS[idx]
last_idx <<- idx

are evaluated in another R process. The assignment to xs, ys, and last_idx is done to the global environment of that R process, which is not the same as the global environment of our main R session. In our main R session, the only assignment to xs and ys was from our initial:

xs <- list()
ys <- list()
last_idx <- 0

assignments, which is why they are still the same.

Now, assume for a moment it would indeed be possible to use <<- to assign to the main R session also from parallel processes. If so, what value should last_idx have at the very end? That would depend on in which order the parallel tasks would complete. For instance, imagine the first iteration (idx = 1) would be very slow and therefore finish last. Would you then expect last_idx to be 1 or 3?

Conclusion: It is not possible, and it does not make sense, to assign to the global environment when running in parallel!

8.2 Return instead of assign in map-reduce calls

The solution for map-reduce functions, such as lapply(), is to return all results and split afterward, e.g.

res <- lapply(1:3, function(idx) {
  data.frame(x = letters[idx], y = LETTERS[idx], idx = idx)
})
xs <- lapply(res, `[[`, "x")
ys <- lapply(res, `[[`, "y")
last_idx <- res[[length(res)]][["last_idx"]]
rm(res)

str(xs)
List of 3
 $ : chr "a"
 $ : chr "b"
 $ : chr "c"
str(ys)
List of 3
 $ : chr "A"
 $ : chr "B"
 $ : chr "C"
last_idx
NULL

This strategy works in parallel too:

library(future.apply)
plan(multisession)

res <- future_lapply(1:3, function(idx) {
  list(x = letters[idx], y = LETTERS[idx], idx = idx)
})
xs <- lapply(res, `[[`, "x")
ys <- lapply(res, `[[`, "y")
last_idx <- res[[length(res)]][["idx"]]
rm(res)

str(xs)
List of 3
 $ : chr "a"
 $ : chr "b"
 $ : chr "c"
str(ys)
List of 3
 $ : chr "A"
 $ : chr "B"
 $ : chr "C"
last_idx
[1] 3

8.3 foreach() is a map-reduce function

The main thing to understand is that foreach() does not work like a for-loop. If you would try, say

library(doFuture)
registerDoFuture()
plan(multisession)

xs <- list()
ys <- list()
last_idx <- 0
void <- foreach(idx = 1:3, .export = c("xs", "ys")) %dopar% {
  xs[[idx]] <- letters[idx]
  ys[[idx]] <- LETTERS[idx]
  last_idx <- idx
}

you’ll find that:

str(xs)
 list()
str(ys)
 list()
last_idx
[1] 0

This is because foreach() is a map-reduce function. It is only its name and the %dopar% operator that makes it visually resemble a for-loop although it isn’t one. To further clarify this, if it would not be for the %dopar% operator, the original creator would probably have designed foreach() to take a function just lapply(), e.g.

void <- foreach(idx = 1:3, function(idx) {
  ...
})

If that would have been the case, it would be clear that foreach() is just another map-reduce function lust like lapply() and map() of the purrr package.

To conclude, we should always use foreach() as a map-reduce function, e.g.

library(doFuture)
plan(multisession)

res <- foreach(idx = 1:3) %dofuture% {
  list(x = letters[idx], y = LETTERS[idx], idx = idx)
}
xs <- lapply(res, `[[`, "x")
ys <- lapply(res, `[[`, "y")
last_idx <- res[[length(res)]][["idx"]]
rm(res)

str(xs)
List of 3
 $ : chr "a"
 $ : chr "b"
 $ : chr "c"
str(ys)
List of 3
 $ : chr "A"
 $ : chr "B"
 $ : chr "C"
last_idx
[1] 3