Part 1 An Overview of The Futureverse
1.1 Why do we parallelize?
Parallel & distributed processing can be used to:
- speed up processing (wall time)
- lower memory footprint (per machine)
- avoid data transfers (compute where data lives)
- other reasons, e.g. asynchronous UI/UX
1.2 The future package (the core of it all)
- A simple, unifying solution for parallel APIs
- “Write once, run anywhere”
- 100% cross platform, e.g. Linux, macOS, MS Windows
- Easy to install (< 0.5 MiB total);
install.packages("future")
- Well tested, lots of CPU mileage, used in production
- Things should “just work”
- Design goal: keep as minimal as possible
1.6 Quick intro: Parallel foreach
1.7 What is the Futureverse?
A Unifying Parallelization Framework in R for Everyone
Require only minimal changes to parallelize existing R code
“Write once, Parallelize anywhere”
Same code regardless of operating system and parallel backend
Lower the bar to get started with parallelization
Fewer decisions for the developer to make
Stay with your favorite coding style
Worry-free: globals, packages, output, warnings, errors just work
Statistically sound: Built-in parallel random number generation (RNG)
Correctness and reproducibility of highest priority
“Future proof”: Support any new parallel backends to come
1.7.1 Packages part of the Futureverse
Core API:
Map-reduce API:
doFuture (used with foreach, plyr, and BiocParallel)
Parallel backends:
parallel (local, MPI, (remote))
future.callr (local)
future.batchtools (HPC job schedulers)
more to come
Additional packages:
- progressr (progress updates, also in parallel)
The first CRAN release was on 2015-06-19, but the initial seed toward building the framework was planted back in 2005. It all grew out of collaborative, real-world research needs of large-scale scientific computations in Genomics and Bioinformatics on all operating systems.
1.7.2 Who is it for?
Everyone using R
Users with some experience in R, but no need to be an advanced R developer
Anyone who wishes to run many slow, repetitive tasks
Any developer who want to support parallel processing without having to worry about the details and having to maintain parallel code
Anyone who wishes to set up an asynchronous Shiny app
1.7.4 What about its quality and stability?
The future package on CRAN since 2015 (exactly 7 years ago)
The API is stable and rarely changes
Very few breaking changes since the start
225 CRAN packages rely on it (https://www.futureverse.org/statistics.html)
Top-1% most downloaded package on CRAN (https://www.futureverse.org/statistics.html)
Every release is well tested (https://www.futureverse.org/quality.html)
I try to work closely with package developers, e.g. deprecation, or issues with design patterns
1.7.5 Support
Please use:
Website: https://www.futureverse.org
Package help pages: https://future.futureverse.org, https://future.apply.futureverse.org, …
Discussions, questions and answers: https://github.com/HenrikBengtsson/future/discussions
Bug reports: https://github.com/HenrikBengtsson/future/issues
1.7.6 How to stay up-to-date
Blog: https://www.futureverse.org/blog.html (feed on https://www.jottr.org/)