Tutorial Overview
Abstract
Objectives
Preparing for this tutorial
Hello and practicalities
Practicalities
Agenda
Polls
Poll #1: How much do you know about parallelization in R?
Poll #2: What is your main operating system when using R?
Poll #3: How do you run R?
Poll #4: Do you have access to a cluster?
1
An Overview of The Futureverse
1.1
Why do we parallelize?
1.2
The future package (the core of it all)
1.3
Quick intro: Evaluate R in the background
1.3.1
Sequentially
1.3.2
In parallel
1.4
Quick intro: Parallel base-R apply
1.4.1
Sequentially
1.4.2
In parallel
1.5
Quick intro: Parallel tidyverse apply
1.5.1
Sequentially
1.5.2
In parallel
1.6
Quick intro: Parallel foreach
1.6.1
Sequentially
1.6.2
In parallel
1.7
What is the Futureverse?
1.7.1
Packages part of the Futureverse
1.7.2
Who is it for?
1.7.3
Who are using it?
1.7.4
What about its quality and stability?
1.7.5
Support
1.7.6
How to stay up-to-date
2
The core Future API
2.1
Three atomic building blocks
2.1.1
Mental model: The Future API decouples a regular R assignment into two parts
2.1.2
Keep doing other things while waiting
2.1.3
Evaluate several things in parallel
2.2
Choosing parallel backend
2.2.1
sequential (default)
2.2.2
multisession: in parallel on local computer
2.2.3
cluster: in parallel on multiple computers
2.2.4
There are other parallel backends and more to come
2.3
Motto and design philosophy
2.4
Demo: ggplot2 remotely
3
Map-reduce APIs
3.1
Parallel alternatives to base-R apply functions
3.1.1
Example: base::lapply(X, FUN)
3.1.2
Example: base::vapply(X, FUN, FUN.VALUE)
3.1.3
Example: base::mapply(X, Y, FUN)
3.2
Parallel alternatives to purrr functions
4
Errors and output
4.1
Business as usual: Exception handling (“dealing with errors”)
4.1.1
Example setup
4.1.2
Exception handling works the same with map-reduce functions
4.2
Business as usual: Warnings
4.3
Business as usual: Messages
4.4
Business as usual: Standard output
4.5
Summary: All types of output is relayed
4.6
Odds and ends
4.6.1
What about standard error (stderr)?
5
Reporting on progress updates
5.1
Basic progress updates
5.2
Progress updates in parallel
5.3
Customizing how progress is reported
5.4
Demo: Mandelbrot sets
6
Quick summary and comparison to other parallel frameworks
6.1
Feature comparisons
6.2
Parallel-backend comparisons
7
Random numbers and reproducibility
7.1
Why is this important?
7.2
Futures use proper parallel RNG
7.2.1
What happens if you forget to declare seed = TRUE?
7.2.2
Random numbers with parallel map-reduce functions
Open discussion
8
Appendix
8.1
Appendix: Exception handling by other parallel map-reduce APIs
8.2
Appendix: Condition handling by other parallel map-reduce APIs
8.3
Appendix: Standard output by other parallel map-reduce APIs
8.4
Appendix: Not everything can be parallelized
8.4.1
Example: R connections can be exported to parallel workers
8.4.2
Example: xml2 objects cannot be exported
8.5
Appendix: Careful with forked parallelization
8.6
Appendix: Missing globals
8.6.1
Example: glue::glue() - object not found
8.6.2
Example: do.call()
8.7
Appendix: Don’t assign to global environment
8.8
Appendix: foreach() is not a for-loop
8.9
Appendix: Debugging
Tutorial: An Introduction to Futureverse for Parallel Processing in R
Open discussion