xs <- list()
ys <- list()
last_idx <- 0
for (idx in 1:3) {
xs[[idx]] <- letters[idx]
ys[[idx]] <- LETTERS[idx]
last_idx <- idx
}8 foreach() is not a for-loop
For-loops are special in the way they can assign values to objects outside of the for-loop. For example,
assigns to both xs and ys. We also see that last_idx is updated in every iteration, and, when the for loop completes, it holds:
last_idx[1] 3
In contrast, we cannot do the same for map-reduce calls, such as lapply(), because they return results, but cannot assign outside.
8.1 Super assignment (<<-) is not a solution
Warning, using “super” assignments (<<-), as in:
xs <- list()
ys <- list()
last_idx <- 0
void <- lapply(1:3, function(idx) {
xs[[idx]] <<- letters[idx]
ys[[idx]] <<- LETTERS[idx]
last_idx <<- idx
})or, similarly, assign(..., envir = parent.frame()), is considered a bad practise for many reasons. Please, do not use such hacks! (they will come and bite you if you try - trust me).
Previously, I said that any lapply() call can be replaced with a future_lapply() such that it can run in parallel. What would happen if we would go ahead and use the above <<- hack? Let us try:
library(future.apply)
plan(multisession)
xs <- list()
ys <- list()
last_idx <- 0
void <- future_lapply(1:3, function(idx) {
xs[[idx]] <<- letters[idx]
ys[[idx]] <<- LETTERS[idx]
last_idx <<- idx
})If we check xs, ys, and last_idx afterward;
str(xs) list()
str(ys) list()
last_idx[1] 0
we find that they are empty and zero.
Q. Why is that?
The reason is that the expressions:
xs[[idx]] <<- letters[idx]
ys[[idx]] <<- LETTERS[idx]
last_idx <<- idxare evaluated in another R process. The assignment to xs, ys, and last_idx is done to the global environment of that R process, which is not the same as the global environment of our main R session. In our main R session, the only assignment to xs and ys was from our initial:
xs <- list()
ys <- list()
last_idx <- 0assignments, which is why they are still the same.
Now, assume for a moment it would indeed be possible to use <<- to assign to the main R session also from parallel processes. If so, what value should last_idx have at the very end? That would depend on in which order the parallel tasks would complete. For instance, imagine the first iteration (idx = 1) would be very slow and therefore finish last. Would you then expect last_idx to be 1 or 3?
Conclusion: It is not possible, and it does not make sense, to assign to the global environment when running in parallel!
8.2 Return instead of assign in map-reduce calls
The solution for map-reduce functions, such as lapply(), is to return all results and split afterward, e.g.
res <- lapply(1:3, function(idx) {
data.frame(x = letters[idx], y = LETTERS[idx], idx = idx)
})
xs <- lapply(res, `[[`, "x")
ys <- lapply(res, `[[`, "y")
last_idx <- res[[length(res)]][["last_idx"]]
rm(res)
str(xs)List of 3
$ : chr "a"
$ : chr "b"
$ : chr "c"
str(ys)List of 3
$ : chr "A"
$ : chr "B"
$ : chr "C"
last_idxNULL
This strategy works in parallel too:
library(future.apply)
plan(multisession)
res <- future_lapply(1:3, function(idx) {
list(x = letters[idx], y = LETTERS[idx], idx = idx)
})
xs <- lapply(res, `[[`, "x")
ys <- lapply(res, `[[`, "y")
last_idx <- res[[length(res)]][["idx"]]
rm(res)
str(xs)List of 3
$ : chr "a"
$ : chr "b"
$ : chr "c"
str(ys)List of 3
$ : chr "A"
$ : chr "B"
$ : chr "C"
last_idx[1] 3
8.3 foreach() is a map-reduce function
The main thing to understand is that foreach() does not work like a for-loop. If you would try, say
library(doFuture)
registerDoFuture()
plan(multisession)
xs <- list()
ys <- list()
last_idx <- 0
void <- foreach(idx = 1:3, .export = c("xs", "ys")) %dopar% {
xs[[idx]] <- letters[idx]
ys[[idx]] <- LETTERS[idx]
last_idx <- idx
}you’ll find that:
str(xs) list()
str(ys) list()
last_idx[1] 0
This is because foreach() is a map-reduce function. It is only its name and the %dopar% operator that makes it visually resemble a for-loop although it isn’t one. To further clarify this, if it would not be for the %dopar% operator, the original creator would probably have designed foreach() to take a function just lapply(), e.g.
void <- foreach(idx = 1:3, function(idx) {
...
})If that would have been the case, it would be clear that foreach() is just another map-reduce function lust like lapply() and map() of the purrr package.
To conclude, we should always use foreach() as a map-reduce function, e.g.
library(doFuture)
plan(multisession)
res <- foreach(idx = 1:3) %dofuture% {
list(x = letters[idx], y = LETTERS[idx], idx = idx)
}
xs <- lapply(res, `[[`, "x")
ys <- lapply(res, `[[`, "y")
last_idx <- res[[length(res)]][["idx"]]
rm(res)
str(xs)List of 3
$ : chr "a"
$ : chr "b"
$ : chr "c"
str(ys)List of 3
$ : chr "A"
$ : chr "B"
$ : chr "C"
last_idx[1] 3