rguides

purrr::pmap

Overview

pmap() is the multi-input variant of the purrr mapping family. Instead of iterating over a single vector like map(), it takes a named list (or data frame) where each element is a vector. On each iteration, pmap() pulls one element from each vector and passes them as arguments to your function.

The “p” stands for “parallel”, though it’s not true parallelism, it means all inputs advance together, one step per iteration.

Installation

library(purrr)

Loading purrr makes pmap() and its typed variants available. The function accepts a list of equal-length vectors as .l and a function .f to apply row-wise, with ... forwarding extra arguments. The typed variants enforce a specific return type, catching mismatches early when you know your output will be character, numeric, or logical.

Signature

pmap(.l, .f, ...)
pmap_lgl(.l, .f, ...)
pmap_int(.l, .f, ...)
pmap_dbl(.l, .f, ...)
pmap_chr(.l, .f, ...)
pmap_dfr(.l, .f, ...)
pmap_df(.l, .f, ...)

Parameters

ParameterDescription
.lA list or data frame. Each element should be a vector of the same length.
.fA function, formula, or atom.
...Additional arguments passed to .f for each call.

Return value

pmap() returns a list. Type-specific variants return atomic vectors. A data frame variant (pmap_dfr()) row-binds results into a tibble.

Basic usage

Row-wise iteration with a data frame

The most common use case: treating each row of a data frame as a set of arguments:

params <- tibble(
  x = c(1, 2, 3),
  y = c(10, 20, 30)
)

pmap_chr(params, ~ paste0(..1, "x", ..2))
# [1] "1x10" "2x20" "3x30"

The formula shorthand uses ..1, ..2, and so on as positional placeholders for the columns of the input list, which works but becomes hard to read when there are more than two or three inputs. For better clarity, you can write a named function with descriptive parameter names that match the column names exactly, making the code self-documenting and less fragile to column reordering.

Named arguments are cleaner than positional placeholders:

pmap_chr(params, function(x, y) paste0(x, "x", y))
# [1] "1x10" "2x20" "3x30"

The named function approach is the recommended style when the input list has clear, stable column names. When your workflow involves configuration data — such as database connection parameters or server addresses — you can structure the inputs as a named list where each element is a vector of settings, and pmap() will call your function once per configuration profile, producing a list of structured results for each environment.

Using a named list

configs <- list(
  host = c("db1.example.com", "db2.example.com"),
  port = c(5432, 5433),
  db   = c("app_prod", "app_staging")
)

pmap(configs, function(host, port, db) {
  list(host = host, port = port, db = db)
})
# [[1]]
# [[1]]$host
# [1] "db1.example.com"
# ...

Building configuration objects from a parameter list is a clean pattern for infrastructure scripts that need to connect to multiple environments. When your inputs come from separate vectors stored in different variables rather than a pre-constructed list or data frame, you can assemble them on the fly with list() and still benefit from pmap()’s row-wise iteration, passing extra fixed arguments through the dots.

Multiple inputs with additional arguments

urls <- c("https://api.example.com", "https://api.test.com")
keys <- c("abc123", "def456")
endpoints <- c("/users", "/products")

pmap_chr(list(url = urls, key = keys), ~ paste0(..1, ..2, sep = "/"))
# [1] "https://api.example.com/abc123" "https://api.test.com/def456"

The on-the-fly list construction pattern scales naturally to real-world tasks like building REST API URLs, where you have a base URL, multiple endpoints, resource identifiers, and authentication keys that must be combined row by row into well-formed request strings.

Common use cases

Building uRLs from parameters

base_url <- "https://api.example.com"

requests <- tibble(
  endpoint = c("/users", "/products", "/orders"),
  id       = c(42, 17, 99),
  key      = c("key_a", "key_b", "key_c")
)

pmap_chr(requests, function(endpoint, id, key) {
  paste0(base_url, endpoint, "/", id, "?key=", key)
})
# [1] "https://api.example.com/users/42?key=key_a"
# [2] "https://api.example.com/products/17?key=key_b"
# [3] "https://api.example.com/orders/99?key=key_c"

Building URLs from structured parameter tables is a pattern you will encounter often in data engineering pipelines. A different analytical use case is fitting a model to each row of a specification table, where one column holds the formula as a string and another column holds the dataset. This lets you run dozens of regressions in a single pmap() call and collect the fitted model objects for later comparison.

Row-wise model fitting

models <- tibble(
  formula = c("mpg ~ wt", "mpg ~ wt + cyl", "mpg ~ wt * cyl"),
  data    = list(mtcars, mtcars, mtcars)
)

pmap(models, function(formula, data) {
  lm(as.formula(formula), data = data)
})
# [[1]]
# Call:
# lm(formula = as.formula(formula), data = data)
# ...

Model fitting with pmap() turns a specification table into a list of model objects, which you can then pass to broom::tidy() or broom::glance() for summary extraction. For data transformation tasks, you can store the name of the column, the transformation to apply, and the dataset in a single table and use pmap() with a switch() statement to dispatch the correct operation for each row.

Applying multiple transformations

library(dplyr)

operations <- tibble(
  col   = c("sepal_length", "petal_length", "sepal_width"),
  op    = c("log", "exp", "sqrt"),
  input = list(mtcars$disp, mtcars$disp, mtcars$disp)
)

pmap(operations, function(col, op, input) {
  switch(op,
    log  = log(input),
    exp  = exp(input),
    sqrt = sqrt(input)
  )
})

Dispatching transformations by name is a flexible pattern when the operations vary by row. Once you are comfortable with pmap(), it helps to understand when to use it versus the simpler map2()map2() handles exactly two inputs and has a cleaner syntax for that specific case, while pmap() handles any number of inputs uniformly and is the better choice when you have three or more parallel vectors or when the number of inputs might vary across runs.

pmap vs map2

map2() handles exactly two inputs. pmap() handles any number:

# map2 — two inputs
map2_chr(first_names, last_names, ~ paste(.x, .y))

# pmap — one, two, or any number
pmap_chr(list(first = first_names, last = last_names), ~ paste(.x, .y))
pmap_chr(list(a = x, b = y, c = z), ~ a + b + c)

When you find yourself nesting map() calls or using map2() with flatten(), pmap() is usually the cleaner answer. Another common pattern is collecting row-wise results into a data frame rather than a list. The pmap_dfr() variant automatically row-binds the output of each function call, which is ideal when each iteration produces a single-row tibble with consistent column names.

pmap_dfr, row-Binding results

When your function returns a tibble or named list, use pmap_dfr() to row-bind results automatically:

results <- tibble(
  name = c("Alice", "Bob"),
  score = c(85, 92)
)

pmap_dfr(results, function(name, score) {
  tibble(
    name     = name,
    score    = score,
    grade    = ifelse(score >= 90, "A", "B"),
    passed   = score >= 70
  )
})
# # A tibble: 2 x 4
#   name  score grade passed
# 1 Alice    85 B     TRUE
# 2   Bob    92 A     TRUE

Row-binding results with pmap_dfr() turns a list of one-row tibbles into a single tidy data frame, which integrates naturally with the rest of the tidyverse. When debugging, the first thing to check is that all elements of the input list have the same length — pmap() raises an error for mismatched lengths unless the shorter vector has exactly one element, in which case it recycles that single value across all iterations.

Gotchas

List elements must have the same length. pmap() recycles inputs of different lengths, but only if the shorter one has length 1:

pmap(list(x = c(1, 2), y = c(3, 4)), ~ ..1 + ..2)  # OK
pmap(list(x = c(1, 2), y = 10), ~ ..1 + ..2)        # OK (y recycled)
pmap(list(x = c(1, 2), y = c(3, 4, 5)), ~ ..1 + ..2)
# Error: Elements of `.l` must have the same size

The length-mismatch error is a helpful guardrail that prevents silent data corruption from misaligned vectors. A separate design consideration is whether to use positional placeholders like ..1 and ..2 or named function arguments. Positional placeholders are concise but fragile — if someone reorders the columns in your input list, the mapping silently shifts to the wrong variables. Named arguments are self-documenting and survive column reordering, making them the safer choice for production code.

Positional placeholders are fragile. If the argument order in your list changes, ..1, ..2 break. Named function arguments are more reliable.

.l can be a data frame. Column names become argument names when passed to a function:

params <- tibble(x = 1:3, y = 10:12)
pmap_int(params, function(x, y) x + y)
# [1] 11 13 15

pmap() iterates over multiple lists in parallel, passing the ith element of each list to the function as a separate argument. The lists must all have the same length; unlike map2(), which is limited to two lists, pmap() handles any number. A common pattern is to pass a data frame as .l: since a data frame is a list of equal-length columns, pmap(df, f) calls f(col1 = df$col1[i], col2 = df$col2[i], ...) for each row. This is an alternative to rowwise() + mutate() for row-wise operations that return complex objects.

pmap_* variants enforce a return type just like map_dbl() and map_chr(). Use pmap_chr() when building strings from row data, pmap_dbl() for scalar numeric computations, and pmap_dfr() or list_rbind(pmap(...)) when each call returns a data frame and you want to row-bind the results.

See also