Parallel purrr with furrr
The furrr package brings parallel processing to your tidyverse workflows. If you already use purrr, switching to furrr requires minimal changes but can give you massive speedups on CPU-intensive tasks.
Why furrr?
The purrr package gives you elegant functional iteration. But by default, purrr runs sequentially. Each iteration waits for the previous one to finish. When you have hundreds or thousands of items to process, this adds up.
furrr replaces purrr’s sequential mapping functions with parallel versions. You swap map() for future_map(), and your code runs across multiple cores automatically.
The magic comes from the future package, which handles the parallelism details. furrr translates your purrr-style code into future-based parallel execution.
Setup
Install both packages from CRAN:
install.packages(c("furrr", "future"))
Load them together:
library(furrr)
library(future)
plan(multisession) # Use multiple R sessions
The plan() function controls how futures are resolved. multisession creates separate R processes on your machine. For a quick test on a single machine, this is usually the right choice.
Basic Parallel Mapping
Here’s the simplest example - transforming a numeric vector:
library(furrr)
library(future)
plan(multisession)
# Sequential (standard purrr)
slow_function <- function(x) {
Sys.sleep(0.1) # Simulate work
x * 2
}
# This takes 1 second (10 * 0.1)
system.time(result <- map(1:10, slow_function))
# This takes ~0.2 seconds on a 4-core machine
system.time(result <- future_map(1:10, slow_function))
The interface is identical to purrr. Change the function name, get parallelism.
Different Output Types
Just like purrr, furrr provides variants for different output types:
future_map()- list outputfuture_map_lgl()- logical vectorfuture_map_int()- integer vectorfuture_map_dbl()- double vectorfuture_map_chr()- character vectorfuture_map_dfr()- row-bound data framefuture_map_dfc()- column-bound data frame
Example with data frames:
library(tidyverse)
library(furrr)
library(future)
plan(multisession)
# Apply transformation to each group in parallel
results <- iris %>%
split(.$Species) %>%
future_map_dfr(~.x %>% mutate(sepal_area = Sepal.Length * Sepal.Width))
Progress Bars
Parallel code can feel slow if you don’t see progress. The progressr package integrates with furrr:
library(furrr)
library(future)
library(progressr)
plan(multisession)
handlers(progressbar)
# Now you see a progress bar
with_progress({
results <- future_map(1:100, ~.x^2, .options = furrr_options(seed = TRUE))
})
The .options argument also lets you set a random seed for reproducibility.
Error Handling
furrr works with purrr’s safety functions. Wrap your function with safely() or possibly():
# safely() returns a list with result and error
safe_divide <- safely(function(x, y) x / y, otherwise = NA)
results <- future_map(1:10, ~safe_divide(.x, sample(0:1, 1)))
# Extract results and errors separately
successes <- map_dbl(results, "result")
errors <- map(results, "error") %>% map_lgl(is.null)
The possibly() variant is simpler - it just returns a default value on error:
safe_log <- possibly(log, otherwise = NA_real_)
future_map(c(1, -1, 2), safe_log) # NA for log(-1)
Performance Tips
Chunking
For very large iterables, process in chunks:
future_map(1:10000, slow_function,
.options = furrr_options(chunk.size = 100))
Limiting Workers
Don’t use more workers than you have cores:
plan(multisession(workers = 4))
Seed Setting
Always set a seed for reproducible results:
future_map(1:100, ~rnorm(1), .options = furrr_options(seed = TRUE))
Common Pitfalls
Shared State
Each worker is a separate R process. Global variables won’t be shared:
# This doesn't work as expected
global_lookup <- c(a = 1, b = 2)
future_map(c("a", "b"), ~global_lookup[.x]) # Won't find global_lookup
Pass data explicitly as function arguments instead.
Side Effects
Writing to files or modifying global state from inside future_map() can cause race conditions. Return the data and write after mapping:
# Bad
future_map(df$path, ~write.csv(read.csv(.x), "output.csv"))
# Good
future_map(df$path, ~read.csv(.x)) %>%
map(~write.csv(.x, "output.csv"))
Small Tasks
Parallelism adds overhead. If each iteration takes less than 10 milliseconds, sequential purrr might actually be faster.
Conclusion
furrr makes parallel processing accessible to tidyverse users. The learning curve is minimal if you already know purrr. The speedup can be dramatic for CPU-bound tasks.
Start with future_map() replacing your map() calls. Add progress bars with progressr. Use safely/possibly for error handling. Profile with microbenchmark to verify you’re actually gaining speed.
See Also
- Parallel Computing in R covers mclapply, parLapply, and the foundations furrr builds on
- Functional Programming with purrr teaches the sequential mapping functions
- The apply Family: apply, lapply, sapply, tapply covers base R iteration