Functional Programming with purrr

· 6 min read · Updated March 29, 2026 · intermediate
r purrr tidyverse functional-programming

If you find yourself writing the same for loop pattern across your R code, purrr deserves a place in your workflow. The purrr package brings a consistent, composable set of tools for applying functions to vectors and lists. Instead of managing loop variables and pre-allocating results, you express what you want to happen to each element, and purrr handles the iteration.

This tutorial covers the core map family, shows how they chain with the pipe operator, and introduces the error-handling utilities that make purrr production-ready.

Getting Started with map()

The map() function applies a function to every element of a vector or list and always returns a list:

library(purrr)

x <- list(1:3, 4:6, 7:9)
map(x, sum)
# [[1]]
# [1] 6
# [[2]]
# [1] 15
# [[3]]
# [1] 24

The first argument is your input. The second is the function to apply. You can pass a real function, an anonymous function defined with ~, or even a numeric index for extraction:

df <- data.frame(a = 1:3, b = 4:6)

# Extract column "a" from df
map(df, "a")
# $a
# [1] 1 2 3
# $b
# NULL

# Extract by index
map(df, 1)
# $a
# [1] 1 2 3

The ~ formula syntax creates anonymous functions on the fly. Inside the formula, .x refers to the current element:

map_dbl(x, ~ mean(.x))
# [1]  2  5  8

Typed Maps: Predictable Output Every Time

The base R sapply() function tries to simplify its output, which sometimes returns a vector and sometimes returns a list — you have to check. purrr’s typed variants eliminate this guesswork. Each typed map expects a specific output type and throws an error if the function returns something incompatible.

FunctionReturns
map_lgl()logical vector
map_int()integer vector
map_dbl()numeric vector
map_chr()character vector
temps <- list(c(72, 75, 68), c(81, 79, 85), c(62, 58, 65))

# map() would return a list of numeric vectors
# map_dbl() returns one clean numeric vector
map_dbl(temps, mean)
# [1] 71.66667 81.66667 61.66667

Use typed maps when you know what type you expect. The error-on-mismatch behavior catches bugs early rather than letting incompatible types propagate through your pipeline.

Mapping Over Two Inputs with map2()

When you need to apply a function to two inputs in parallel, map2() does exactly that. It pairs elements from two vectors or lists and processes them together:

x <- c(10, 20, 30)
y <- c(1, 2, 3)

map2_dbl(x, y, ~ .x / .y)
# [1] 10 10 10

Inside the formula, .x refers to elements from the first input and .y from the second. This is particularly useful for operations that need two datasets aligned element-wise.

Handling Three or More Inputs with pmap()

For operations that need three or more inputs, pmap() maps over a list of vectors positionally. Each vector in the list provides one argument to the function:

df <- data.frame(
  x = 1:3,
  y = c(10, 20, 30),
  z = c(100, 200, 300)
)

pmap_dbl(df, sum)
# [1] 111 231 351

pmap() is especially natural with data frames, where each row represents one observation and each column represents one variable. You can pass any function to pmap() — named arguments get matched by name from the list, unnamed by position.

Side Effects with walk() and iwalk()

Not every operation produces a return value. Sometimes you want to print something, save a file, or generate a plot. walk() calls a function for its side effects and returns the original input invisibly, making it safe to use mid-pipe:

files <- c("report_a.csv", "report_b.csv", "report_c.csv")

# Save each dataset to its own file
# walk() returns the filenames, which is useful for chaining
files %>%
  walk(~ write.csv(get(.x), .x))

iwalk() adds an index or name as a second argument. This comes in handy when you need the name alongside the value:

configs <- list(api_key = "abc123", endpoint = "https://example.com", timeout = 30)

iwalk(configs, ~ message(str_c(.y, " = ", .x)))
# api_key = abc123
# endpoint = https://example.com
# timeout = 30

The pronoun .y carries the name, .x carries the value.

Filtering with keep() and discard()

The keep() and discard() functions filter a list or vector based on a predicate. keep() retains elements where the predicate is TRUE; discard() removes them:

x <- list(a = 1, b = NULL, c = 3, d = integer(0))

keep(x, ~ length(.x) > 0)
# $a
# [1] 1
# $c
# [1] 3

discard(x, ~ length(.x) > 0)
# $b
# NULL
# $d
# integer(0)

compact() is a shorthand for discarding empty elements (anything with length zero):

compact(x)
# $a
# [1] 1
# $c
# [1] 3

These functions work on any vector or list, not just lists.

Folding with reduce() and accumulate()

reduce() takes a list and applies a function cumulatively until only one value remains. It is useful for collapsing a list into a single object:

list_of_dfs <- list(
  data.frame(x = 1:3),
  data.frame(y = 4:6),
  data.frame(z = 7:9)
)

reduce(list_of_dfs, full_join)

accumulate() is the same operation but returns every intermediate result:

1:5 %>% accumulate(`+`)
# [1]  1  3  6 10 15

Both accept an .init argument if you need to start with a seed value rather than the first element of the list.

Error Handling with safely() and possibly()

Production code needs to handle failures gracefully. safely() wraps a function so it never throws an error — instead, it returns a list with result and error elements:

safe_read <- safely(read.csv)

result <- safe_read("possibly_missing.csv")
result$result  # NULL if file didn't exist
result$error   # NULL if successful, error object otherwise

possibly() is simpler: it returns a default value when the function fails:

map_chr(list_files, possibly(read_file, otherwise = "FILE NOT FOUND"))

You can compose safely() and possibly() together for layered error handling strategies.

Accessing Nested Data with pluck() and chuck()

When working with deeply nested lists, pluck() extracts elements by path:

api_response <- list(
  data = list(
    users = list(
      list(name = "Alice", age = 30),
      list(name = "Bob", age = 25)
    )
  )
)

pluck(api_response, "data", "users", 1, "name")
# [1] "Alice"

pluck() returns NULL with a warning if the path doesn’t exist. chuck() does the same thing but throws an error instead — use it when missing data is genuinely a bug rather than an expected condition.

Putting It Together: A Real Pipeline

Here is how these functions compose in a realistic data pipeline. You have a list of URLs, you want to fetch each one, parse the JSON, extract a field, and save only the successful results:

library(purrr)
library(httr)

fetch_and_parse <- safely(function(url) {
  resp <- GET(url)
  content(resp, as = "parsed")
})

results <- urls %>%
  map(fetch_and_parse) %>%
  map("result") %>%
  compact() %>%
  map_chr("username")

The pipeline reads cleanly from top to bottom: fetch each URL, pull out the result, drop failures, extract the usernames. This kind of composition is where purrr shines.

See Also