Futureverse - A Unifying Parallelization Framework in R for Everyone

STATS/BIODS 352: Topics in Computing for Data Science, Bridging Methodology and Practice, Stanford University, 2023

Author

Henrik Bengtsson

Published

2023-05-08

Abstract
A future is a programming construct designed for concurrent and asynchronous evaluation of code, making it particularly useful for parallel processing. The future package implements the Future API for programming with futures in R. This minimal API provides sufficient constructs for implementing parallel versions of well-established, high-level map-reduce APIs. The future ecosystem supports exception handling, output and condition relaying, parallel random number generation, and automatic identification of globals lowering the threshold to parallelize code. The Future API bridges parallel frontends with parallel backends, following the philosophy that end-users are the ones who choose the parallel backend while the developer focuses on what to parallelize. A variety of backends exist, and third-party contributions meeting the specifications, which ensure that the same code works on all backends, are automatically supported. The lectures focus on R but programmers from other languages will also find the material useful.

Introduction

In these lectures, I will cover how to parallelize R code using the Futureverse, which at its core consist of the future package (Bengtsson (2021)). Other packages in the Futureverse build on the future package to provide more powerful features, e.g. future.apply, furrr, and doFuture.

The Futureverse builds upon, enhances, and unifies established parallelization frameworks in R, e.g. parallel and foreach. You can think of it as a user friendly, unifying wrapper on top of many of the existing more low-level alternatives that each come with their own unique functions and settings. By using Futureverse, there are less things you have to worry about and your code will be less cluttered by special parallelization instructions.

The future package was introduced in 2015, and is now a stable and well established solution for parallelization in R. For example, it is among the top-0.9% most downloaded R packages, and there are hundreds of R packages that use it for their parallelization needs.

Take-home messages of this lecture set

By following these two lectures, you will learn that:

  • parallelization does not have to be hard

  • there are things you cannot parallelize

  • Futureverse simplifies parallelization in R (disclaimer!)

  • foreach() is not the same as a for-loop

  • forked parallel processing is neat, but we should use it with great caution

You will also learn:

  • a bit about the “future” concept for parallel programming

  • why it is called “futures”

  • that many programming languages supports futures, e.g. R, Python, Julia, and C++

  • about common mistakes to avoid

From this, I hope that you will think of parallelization as being less magic, especially if you never used it before.

Disclaimer

I am the creator and lead maintainer of the Futureverse ecosystem. I choose to use it to teach parallelization in R, because I think it is the simplest way to parallelize tasks in R.