<- function(x) {
slow_sum <- 0
sum
for (value in x) {
Sys.sleep(1.0) ## one-second slowdown per value
<- sum + value
sum
}
sum }
2 How do you do two things at the same time in R?
Imagine we have a very slow function called slow_sum()
that takes a numeric vector as input, calculates the sum, and returns it as numeric scalar:
For example, we can calculate the sum of \(1, 2, ..., 10\) as:
<- slow_sum(1:10)
y y
[1] 55
The problem is that this takes more than 10 seconds to complete, e.g.
tic()
<- slow_sum(1:10)
y toc()
Time difference of 10 secs
This will be costly if we want repeat this twice or more;
tic()
<- slow_sum(1:10)
y1 <- slow_sum(11:20)
y2 toc()
Time difference of 20.1 secs
Wouldn’t it be great if we could run these two tasks concurrently?
If they could run at the same time, we would finish both in the same period of time as when we call the function once. It turns out we can use the future package for this. Here’s is how we can do it with a minimal tweak.
library(future) ## defines %<-%
plan(multisession) ## set them to run in parallel
%<-% slow_sum(1:10)
y1 %<-% slow_sum(11:20)
y2 y1
[1] 55
y2
[1] 155
The %<-%
assignment operator works by launching slow_sum(1:10)
in the background, preparing to assign the result to y1
when its done, and then returning immediately. Same for the second expression. This means that both of these future assignments complete almost instantly:
tic()
%<-% slow_sum(1:10)
y1 %<-% slow_sum(11:20)
y2 toc()
Time difference of 1 secs
What happens next, is that whenever we try to “use” the value of y1
or y2
, R will automatically wait for the result to become available. This is where we might have to wait:
y1
[1] 55
toc()
Time difference of 9.9 secs
In other words, we have to wait for y1
to complete, but, since the both future expressions ran in parallel, y2
completes in about the same time, and we do not have to spend time waiting for its result:
y2
[1] 155
toc()
Time difference of 11.1 secs
So, all in all, we completed both tasks in the same amount of time as as single one.