Futureverse - A Unifying Parallelization Framework in R for Everyone
STATS/BIODS 352: Topics in Computing for Data Science, Bridging Methodology and Practice, Stanford University, 2023
Introduction
In these lectures, I will cover how to parallelize R code using the Futureverse, which at its core consist of the future package (Bengtsson (2021)). Other packages in the Futureverse build on the future package to provide more powerful features, e.g. future.apply, furrr, and doFuture.
The Futureverse builds upon, enhances, and unifies established parallelization frameworks in R, e.g. parallel and foreach. You can think of it as a user friendly, unifying wrapper on top of many of the existing more low-level alternatives that each come with their own unique functions and settings. By using Futureverse, there are less things you have to worry about and your code will be less cluttered by special parallelization instructions.
The future package was introduced in 2015, and is now a stable and well established solution for parallelization in R. For example, it is among the top-0.9% most downloaded R packages, and there are hundreds of R packages that use it for their parallelization needs.
Take-home messages of this lecture set
By following these two lectures, you will learn that:
parallelization does not have to be hard
there are things you cannot parallelize
Futureverse simplifies parallelization in R (disclaimer!)
foreach()
is not the same as a for-loopforked parallel processing is neat, but we should use it with great caution
You will also learn:
a bit about the “future” concept for parallel programming
why it is called “futures”
that many programming languages supports futures, e.g. R, Python, Julia, and C++
about common mistakes to avoid
From this, I hope that you will think of parallelization as being less magic, especially if you never used it before.
Disclaimer
I am the creator and lead maintainer of the Futureverse ecosystem. I choose to use it to teach parallelization in R, because I think it is the simplest way to parallelize tasks in R.