Parallel Computing in R
Parallel computing lets you use multiple CPU cores at once to speed up computationally intensive tasks. R is single-threaded by default, but several packages let you distribute work across cores. This guide covers the main approaches.
The parallel Package
R comes with the parallel package installed. It provides functions for parallel computing without requiring additional dependencies. The package offers two main approaches: forking (available on Unix-like systems) and socket clusters (cross-platform).
The key functions live in the parallel namespace. You access them with parallel::mclapply() or parallel::parLapply().
mclapply: Fork-Based Parallelism
The mclapply() function is a parallelized version of lapply(). It uses forking, which creates child processes that inherit a copy of the parent’s memory. This makes it fast to set up because the environment doesn’t need to be explicitly copied to each worker.
library(parallel)
# Create a simple function that takes time
slow_square <- function(x) {
Sys.sleep(0.1) # Simulate slow computation
x ^ 2
}
# Sequential version
system.time({
result <- lapply(1:10, slow_square)
})
# Parallel version using mclapply
system.time({
result <- mclapply(1:10, slow_square, mc.cores = 4)
})
The mc.cores argument controls how many parallel processes to use. By default, it uses all available cores. Setting it to 1 forces sequential execution, which is useful for debugging.
There’s a catch: mclapply() does not work on Windows when mc.cores > 1. Windows doesn’t support forking. On Windows, you get a warning and it falls back to regular lapply().
parLapply: Socket Cluster Parallelism
The parLapply() function uses socket clusters instead of forking. It creates separate R processes that communicate over network sockets. This works on all operating systems, including Windows.
library(parallel)
# Create a cluster with 4 workers
cl <- makePSOCKcluster(4)
# Define a function (must exist in each worker's environment)
slow_square <- function(x) {
Sys.sleep(0.1)
x ^ 2
}
# Export the function to all workers
clusterExport(cl, "slow_square")
# Run in parallel
system.time({
result <- parLapply(cl, 1:10, slow_square)
})
# Clean up
stopCluster(cl)
The extra steps (creating the cluster, exporting variables) make this more verbose than mclapply(), but it works everywhere.
mclapply vs parLapply
The main differences:
| Feature | mclapply | parLapply |
|---|---|---|
| Platform | Unix/Linux/macOS only | All platforms |
| Setup | Simple one-liner | Requires cluster setup |
| Memory | Shares parent memory (copy-on-write) | Each worker has separate memory |
| Performance | Faster for small data | Slightly slower but more reliable |
| Debugging | Harder | Easier to inspect workers |
For quick scripts on Unix systems, mclapply() is convenient. For production code that needs to run on multiple machines, parLapply() or higher-level abstractions are safer.
The future and furrr Packages
The future package provides a unified interface for parallel computing. Instead of calling specific parallel functions, you declare whether code should run sequentially or in parallel, and the future ecosystem handles the rest.
library(future)
library(furrr)
# Use all available cores
plan(multisession, workers = 4)
# furrr reimplements purrr functions in parallel
slow_square <- function(x) {
Sys.sleep(0.1)
x ^ 2
}
# This looks like regular purrr code
result <- future_map(1:10, slow_square)
The furrr package provides parallel versions of purrr functions: future_map(), future_map2(), future_pmap(), future_walk(), and more. If you already use purrr, switching to furrr requires minimal code changes.
The future system has several plans:
sequential: Run on the main R process (default)multisession: Run in separate R sessions (like PSOCK clusters)multicore: Run in forked processes (like mclapply, not available on Windows)cluster: Run on a specific socket cluster you create
You can switch plans dynamically. This makes it easy to develop with sequential (easier to debug) and deploy with multisession (faster).
Common Pitfalls
Shared State
Parallel workers don’t share memory. If your function relies on global variables or modifies objects in place, it won’t work as expected. Pass everything as arguments.
# This fails in parallel
global_counter <- 0
increment <- function(x) {
global_counter <<- global_counter + 1
x + 1
}
# Each worker has its own global_counter
# Results will be wrong
Nested Parallelization
Don’t call parallel functions inside other parallel functions. The behavior is undefined and often causes crashes. If you need nested parallelism, use mc.cores = 1 for the inner call.
Random Numbers
Each worker starts with the same random seed. If your code depends on random numbers, you need to generate them differently in each worker. The future ecosystem handles this automatically when you use future_map().
Side Effects
Parallel functions return values. If your function prints, writes to disk, or modifies global state, you need to collect those side effects and handle them after the parallel call completes.
Performance Considerations
Parallel computing adds overhead. Creating processes and moving data between them takes time. For very fast operations, the overhead exceeds the time saved. The benefit shows up when:
- Each iteration takes at least 10-50ms
- You have many iterations
- The operation is CPU-bound
Parallel computing doesn’t help with I/O-bound operations (reading files, downloading data). For I/O, consider asynchronous approaches or proper batching.
Conclusion
The parallel package gives you two tools: mclapply() for quick parallel jobs on Unix systems, and parLapply() for cross-platform compatibility. The future and furrr packages build on these foundations with a cleaner API that integrates well with the tidyverse.
Start with mclapply() if you’re on Linux or macOS and want a quick speedup. Move to furrr for production code that needs to work reliably across environments.
See Also
- The apply Family: apply, lapply, sapply, tapply covers sequential iteration
- Profiling and Optimizing R Code helps you find what to optimize
- Memory Management in R explains how R handles memory