Functional Programming with purrr
Functional programming (FP) transforms how you write R code. Instead of writing loops that mutate objects, you create small functions that transform data and compose them together. The purrr package, part of the tidyverse, provides a consistent toolkit for functional programming in R.
This guide covers the core purrr functions, shows how they compare to base R alternatives, and demonstrates real-world patterns that will make your code more readable and maintainable.
Why use purrr?
Base R provides the apply family (apply(), lapply(), sapply(), tapply()), but these functions have inconsistent APIs and return types. purrr offers:
- Consistent syntax, every function follows the same pattern
- Type safety,
map()variants guarantee output types (integer, character, data frame) - Error handling, built-in wrappers for graceful failure management
- Composition, functions work smoothly with other tidyverse tools
The map() family
The core of purrr is map(), which applies a function to each element of a vector or list.
library(purrr)
# Apply a function to each element
numbers <- list(1:3, 4:6, 7:9)
map(numbers, sum)
# [[1]]
# [1] 6
# [[2]]
# [1] 15
# [[3]]
# [1] 24
Type-Specific map() variants
purrr provides type-specific variants that return specific data types, eliminating the need for type conversion:
# map_chr returns character vector
names <- list(c("apple", "banana"), c("cherry", "date"))
map_chr(names, ~ paste(.x, collapse = "-"))
# [1] "apple-banana" "cherry-date"
# map_dbl returns numeric vector
map_dbl(list(1:3, 4:6, 7:9), mean)
# [1] 2 5 8
# map_int returns integer vector
map_int(list(c(1, 2), c(3, 4)), length)
# [1] 2 2
# map_lgl returns logical vector
map_lgl(list(1, 0, 1), ~ .x > 0)
# [1] TRUE FALSE TRUE
# map_df returns data frame
map_df(list(mtcars[1:3, ], mtcars[4:6, ]), nrow)
# # A tibble: 1 × 1
# nrow
# <int>
# 1 6
Anonymous functions and shortcuts
Use the tilde syntax for inline anonymous functions:
# Full anonymous function
map(numbers, function(x) x * 2)
# Tilde shortcut (purrr style)
map(numbers, ~ .x * 2)
# Named component extraction
people <- list(
list(name = "Alice", age = 30),
list(name = "Bob", age = 25),
list(name = "Carol", age = 35)
)
map_chr(people, ~ .x$name)
# [1] "Alice" "Bob" "Carol"
map_int(people, ~ .x$age)
# [1] 30 25 35
Working with multiple arguments: map2() and pmap()
When you need to iterate over two or more vectors in parallel, use map2() for two arguments or pmap() for multiple arguments:
# map2: iterate over two vectors
x <- c(10, 20, 30)
y <- c(1, 2, 3)
map2_dbl(x, y, ~ .x / .y)
# [1] 10 10 10
# pmap: iterate over multiple vectors
df <- data.frame(
a = c(1, 2, 3),
b = c(10, 20, 30),
c = c(100, 200, 300)
)
pmap_dbl(df, ~ ..1 + ..2 * ..3)
# [1] 1011 2022 3033
The ..1, ..2, etc. syntax in pmap refers to positional arguments from the data frame or list.
Handling errors: safely() and possibly()
Production code needs graceful error handling. purrr provides wrappers that transform errors into manageable outputs:
# safely: returns list with result and error
safe_sqrt <- safely(sqrt, otherwise = NA)
results <- map(list(4, 9, -1, 16), safe_sqrt)
str(results)
# List of 4
# $ :List of 2
# ..$ result: num 2
# ..$ error : NULL
# $ :List of 2
# ..$ result: num 3
# ..$ error : NULL
# $ :List of 2
# ..$ result: num NA
# ..$ error :List of 2
# .. ..$ message: "NaN" ...
# $ :List of 2
# ..$ result: num 4
# ..$ error : NULL
# Extract successful results
map_dbl(results, "result")
# [1] 2 3 NA 4
# possibly: simpler, returns default on error
possibly_sqrt <- possibly(sqrt, default = NA_real_)
map_dbl(list(4, 9, -1, 16), possibly_sqrt)
# [1] 2 3 NA 4
Walking: walk() for side effects
Use walk() when you want to perform side effects (printing, saving files, sending emails) without caring about the return value:
# Print each element (side effect)
walk(list("a", "b", "c"), ~ cat(.x, "\n"))
# Save multiple plots to files
plots <- list(
ggplot(mtcars, aes(mpg, wt)) + geom_point(),
ggplot(mtcars, aes(cyl, mpg)) + geom_boxplot()
)
walk2(plots, c("scatter.png", "box.png"), ~ ggsave(.x, filename = .y))
# Create multiple directories
walk(c("output/figures", "output/data", "output/reports"),
~ dir.create(.x, recursive = TRUE, showWarnings = FALSE))
Reducing and accumulating
reduce() and accumulate() collapse a list into a single value by repeatedly applying a binary function:
# reduce: accumulate to single value
numbers <- list(c(1, 2), c(3, 4), c(5, 6))
reduce(numbers, c)
# [1] 1 2 3 4 5 6
reduce(list(1, 2, 3, 4), `+`)
# [1] 10
# accumulate: show all intermediate results
accumulate(list(1, 2, 3, 4), `+`)
# [1] 1 3 6 10
# Practical example: join multiple data frames
df1 <- data.frame(id = 1:2, x = c("a", "b"))
df2 <- data.frame(id = 2:3, y = c("c", "d"))
df3 <- data.frame(id = 3:4, z = c("e", "f"))
reduce(list(df1, df2, df3), dplyr::left_join, by = "id")
# id x y z
# 1 1 a <NA> <NA>
# 2 2 b c <NA>
# 3 3 <NA> d e
# 4 4 <NA> <NA> f
Filtering and selecting: keep() and discard()
Extract elements based on conditions without explicit loops:
x <- list(1, 2, 3, 4, 5, 6)
keep(x, ~ .x %% 2 == 0)
# [[1]]
# [1] 2
# [[2]]
# [1] 4
# [[3]]
# [1] 6
discard(x, ~ .x %% 2 == 0)
# [[1]]
# [1] 1
# [[2]]
# [1] 3
# [[3]]
# [1] 5
# compact: remove NULL and empty elements
y <- list(1, NULL, 2, character(0), 3, list())
compact(y)
# [[1]]
# [1] 1
# [[2]]
# [1] 2
# [[3]]
# [1] 3
purrr vs base R
Here’s a practical comparison showing why purrr is often preferred:
# Base R approach
lapply(mtcars, class)
# purrr approach
map(mtcars, class)
# Base R: sapply attempts to simplify (unreliable)
sapply(mtcars, mean)
# Returns named numeric vector, but behavior varies
# purrr: explicit type
map_dbl(mtcars, mean) # Fails if not numeric
map_chr(mtcars, class) # Always returns character
# Base R: error handling is manual
result <- tryCatch(lapply(1:3, function(x) sqrt(x)),
error = function(e) NA)
# purrr: built-in error handling
map(1:3, safely(sqrt))
When to use purrr
Use purrr when you need:
- Consistent, predictable behavior across different data types
- Type-safe iteration that fails loudly on type mismatches
- Error handling without verbose tryCatch blocks
- Readable code that clearly expresses intent
Stick with base R when:
- Working in environments without tidyverse
- Performance is critical (base R can be faster for simple operations)
- Existing codebase uses apply functions
The purrr package transforms iterative code into elegant, functional expressions. Start replacing your loops with map functions, and your R code will become more concise and maintainable.
Map variants
purrr provides type-safe map variants that guarantee the return type: map_dbl() returns a double vector, map_int() returns integer, map_chr() returns character, map_lgl() returns logical. If the function returns the wrong type, the variant throws an informative error instead of silently coercing. Use typed variants whenever the output type is known, it makes type errors visible immediately rather than downstream when the wrong type causes an unexpected failure.
Iterating over multiple inputs
map2(.x, .y, .f) iterates over two lists or vectors in parallel, passing .x[[i]] and .y[[i]] to .f. pmap(.l, .f) generalizes to any number of parallel inputs, where .l is a list of lists or a data frame (each column becomes one argument). A data frame of parameters combined with pmap() is an idiomatic way to run a function with all combinations of settings.
Handling failures gracefully
safely() and possibly() are the two tools for non-aborting iteration. safely(.f) returns a list with result and error slots; possibly(.f, otherwise) returns the result on success and otherwise on failure. possibly() is simpler when you just want to substitute a default value. safely() is better when you need to inspect failures after the fact. After map(list, safely(f)), use purrr::transpose() to reorganize from a list of {result, error} to {results: ..., errors: ...}.
Working with data frames
map_dfr() (map returning data frame by rows) is the idiomatic way to apply a function to a list and combine results into a single data frame. map_dfc() binds by columns. Both are convenience wrappers for map() followed by bind_rows() or bind_cols(). For functions that return tibbles, map_dfr() handles binding column types correctly across results even when some results have extra or missing columns, they fill with NA.
First-Class functions
In R, functions are first-class objects: they can be stored in variables, passed as arguments, and returned from other functions. This enables higher-order functions, functions that take functions as arguments or return functions.
Reduce(f, list, accumulate = FALSE) applies a binary function cumulatively. Filter(predicate, list) keeps elements satisfying a condition. Map(f, ...) applies a function to parallel lists. These base R higher-order functions work on any list or vector.
purrr provides a more ergonomic API: reduce(list, f), keep(list, predicate), map(list, f). The purrr functions handle edge cases (empty inputs, single-element inputs) consistently and provide better error messages.
Function factories
A function factory is a function that returns a function. Factory parameters become variables in the returned function’s closure:
power_function <- function(exp) {
function(x) x ^ exp
}
square <- power_function(2)
cube <- power_function(3)
square(4) # 16
cube(3) # 27
Function factories are the right pattern when you need many similar functions that differ only in their parameters. scales::label_number(), scales::label_dollar(), and similar formatters in the scales package are function factories that return formatting functions.
purrr::partial(f, arg = value) creates a new function with some arguments pre-filled (partial application). partial(round, digits = 2) creates a function that rounds to two decimal places. This is more concise than a factory when you just need to fix some arguments.
Function composition
purrr::compose(f, g) creates a function equivalent to function(x) f(g(x)), applying g first, then f. compose(log, abs) applies abs then log. For multiple functions, compose(f, g, h) applies in right-to-left order.
The pipe |> (or %>%) achieves composition for specific inputs. Composition with compose() or purrr::compose() creates a reusable function rather than a pipeline for a specific dataset.
walk for side effects
walk(x, f) is map() for side effects — it applies f to each element and returns x invisibly. Use it when you want the effects (printing, writing files, sending emails) but not the return values.
walk2(x, y, f) and pwalk(list, f) are the two-input and multi-input variants. pwalk(params_df, ~ write_csv(.x, .y)) iterates over rows of a data frame to write multiple files — the data is in .x and the file path is in .y.
imap and indexed iteration
imap(x, f) iterates with both the value and the index (name or position). For a named list: imap(results, ~ message(.y, ": ", .x)) prints each name and value. For an unnamed vector: imap(scores, ~ paste("Item", .y, "scored", .x)) uses numeric indices.
iwalk(x, f) is the side-effect variant. imap_chr(x, ~ paste(.y, "=", .x)) returns a character vector where each element combines the name and value.
Memoization
memoise::memoise(f) caches function results. The first call computes the result; subsequent calls with the same arguments return the cached value. This is automatic memoization — no manual cache management.
memoise::memoize(f, cache = memoise::cache_memory(max_size = 1e8)) limits cache size. memoise::forget(memoized_fn) clears the cache. is.memoised(f) checks whether a function is memoized.
Memoization is appropriate for pure functions (same inputs always produce the same output) that are called repeatedly with the same arguments. API calls, slow computations, and database queries are natural candidates. Do not memoize functions with side effects.