Part 1 An Overview of The Futureverse

1.1 Why do we parallelize?

Parallel & distributed processing can be used to:

  • speed up processing (wall time)
  • lower memory footprint (per machine)
  • avoid data transfers (compute where data lives)
  • other reasons, e.g. asynchronous UI/UX

1.2 The future package (the core of it all)

The hexlogo for the ‘future’ package adopted from original designed by Dan LaBar
The hexlogo for the ‘future’ package adopted from original designed by Dan LaBar
  • A simple, unifying solution for parallel APIs
  • “Write once, run anywhere”
  • 100% cross platform, e.g. Linux, macOS, MS Windows
  • Easy to install (< 0.5 MiB total); install.packages("future")
  • Well tested, lots of CPU mileage, used in production
  • Things should “just work”
  • Design goal: keep as minimal as possible

1.3 Quick intro: Evaluate R in the background

1.3.1 Sequentially

x <- 7
y <- slow(x)           # ~1 minute
z <- another(x)        # ~0.5 minute
                       # all done in ~1.5 minutes

1.3.2 In parallel

library(future)
plan(multisession)     # run things in parallel

x <- 7
f <- future(slow(x))   # ~1 minute (in background)   ‎
z <- another(x)        # ~0.5 minute (in current R session)
y <- value(f)          # get background results
                       # all done in ~1 minutes

1.4 Quick intro: Parallel base-R apply

1.4.1 Sequentially

x <- 1:20
y <- lapply(x, slow)          # ~ 20 minutes

1.4.2 In parallel

library(future.apply)
plan(multisession, workers = 4)

x <- 1:20
y <- future_lapply(x, slow)   # ~ 5 minutes

1.5 Quick intro: Parallel tidyverse apply

1.5.1 Sequentially

library(purrr)

x <- 1:20
y <- map(x, slow)          # ~20 minutes

1.5.2 In parallel

library(furrr)
plan(multisession, workers = 4)

x <- 1:20
y <- future_map(x, slow)   # ~5 minutes

1.6 Quick intro: Parallel foreach

1.6.1 Sequentially

library(foreach)

x <- 1:20
y <- foreach(z = x) %do% slow(z)     # ~20 minutes

Comment: Technically, we want to use y <- foreach(z = x) %do% local({ slow(x) }) here.

1.6.2 In parallel

library(doFuture)
registerDoFuture()
plan(multisession, workers = 4)

x <- 1:20
y <- foreach(z = x) %dopar% slow(z)  # ~5 minutes

1.7 What is the Futureverse?

  • A Unifying Parallelization Framework in R for Everyone

  • Require only minimal changes to parallelize existing R code

  • “Write once, Parallelize anywhere”

  • Same code regardless of operating system and parallel backend

  • Lower the bar to get started with parallelization

  • Fewer decisions for the developer to make

  • Stay with your favorite coding style

  • Worry-free: globals, packages, output, warnings, errors just work

  • Statistically sound: Built-in parallel random number generation (RNG)

  • Correctness and reproducibility of highest priority

  • “Future proof”: Support any new parallel backends to come

1.7.1 Packages part of the Futureverse

Core API:

Map-reduce API:

Parallel backends:

Additional packages:

  • progressr (progress updates, also in parallel)

The first CRAN release was on 2015-06-19, but the initial seed toward building the framework was planted back in 2005. It all grew out of collaborative, real-world research needs of large-scale scientific computations in Genomics and Bioinformatics on all operating systems.

1.7.2 Who is it for?

  • Everyone using R

  • Users with some experience in R, but no need to be an advanced R developer

  • Anyone who wishes to run many slow, repetitive tasks

  • Any developer who want to support parallel processing without having to worry about the details and having to maintain parallel code

  • Anyone who wishes to set up an asynchronous Shiny app

1.7.3 Who are using it?

1.7.4 What about its quality and stability?

1.7.6 How to stay up-to-date