purrr::walk
Overview
walk() is purrr’s function for when you want to call a function for its side effects, not its return value. While map() transforms data and returns results, walk() calls the function, discards the result, and returns the input unchanged. This makes walk() a natural fit at the end of a %>% pipeline where you need to inspect or export data without breaking the chain.
walk() vs map()
The core difference:
library(purrr)
# map() returns the results of calling toupper() on each element
result <- c("apple", "banana") |> map(toupper)
result
#> [[1]]
#> [1] "APPLE"
#>
#> [[2]]
#> [1] "BANANA"
# walk() calls toupper() for its side effect, returns input invisibly
c("apple", "banana") |> walk(print)
#> [1] "apple"
#> [1] "banana"
walk() calls the function on each element, discards the return value, and returns the original input. The original vector passes through unchanged.
Basic Usage
Printing
Inspect intermediate results in a pipeline without disrupting it:
library(dplyr)
mtcars |>
filter(cyl > 4) |>
walk(\(df) print(head(df, 3))) |>
group_by(cyl) |>
summarise(avg_mpg = mean(mpg))
Saving files
Save plots or data in a pipeline:
library(ggplot2)
mtcars |>
split(~cyl) |>
walk(\(df) {
ggsave(
paste0("plot_", unique(df$cyl), ".png"),
plot = ggplot(df, aes(x = wt, y = mpg)) + geom_point()
)
})
Each group gets its own PNG file. The original data frames pass through walk() unchanged, so the pipeline can continue.
Saving CSVs with walk2()
When you have two parallel vectors — data and filenames — use walk2():
library(readr)
df1 <- tibble(x = 1:5, y = rnorm(5))
df2 <- tibble(x = 1:5, y = rnorm(5))
df3 <- tibble(x = 1:5, y = rnorm(5))
list(df1, df2, df3) |>
set_names(c("alpha", "beta", "gamma")) |>
walk2(
c("alpha.csv", "beta.csv", "gamma.csv"),
\(df, path) write_csv(df, path)
)
Saving CSVs with pwalk()
When you have a tibble of metadata:
files <- tibble(
df = list(df1, df2, df3),
path = c("alpha.csv", "beta.csv", "gamma.csv"),
desc = c("First dataset", "Second dataset", "Third dataset")
)
files |> pwalk(\(df, path, desc) {
message("Saving: ", desc)
write_csv(df, path)
})
#> Saving: First dataset
#> Saving: Second dataset
#> Saving: Third dataset
Real-World Example: Export Multiple Sheets
With openxlsx, save each data frame to a separate sheet in one workbook:
library(openxlsx)
list(
mtcars = mtcars,
iris = iris,
PlantGrowth = PlantGrowth
) |>
set_names(c("Motor Trend Cars", "Fisher Iris", "Plant Growth")) |>
imap(\(df, sheet_name) {
wb <- createWorkbook()
addWorksheet(wb, sheet_name)
writeData(wb, sheet_name, df)
saveWorkbook(wb, paste0(sheet_name, ".xlsx"), overwrite = TRUE)
})
imap() feeds both the element (as df) and the name (as sheet_name) into the function.
Plotting in a Pipeline
Generate and save exploratory plots per group:
iris |>
split(~Species) |>
set_names(\(x) paste0("plot_", x, ".png")) |>
walk(\(df) {
p <- ggplot(df, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
ggtitle(unique(df$Species))
ggsave(names(df), plot = p, width = 5, height = 4)
})
The pipe flows cleanly from data preparation through to plot generation and saving, with no intermediate objects cluttering the workspace.
Why Not Use map() for Side Effects?
You can use map() for side effects, but it returns a list of return values — which is wasteful if you don’t use them and signals the wrong intent to future readers:
# Works but returns a list of NULLs (invisible print returns NULL)
c("a", "b") |> map(print)
#> [1] "a"
#> [1] "b"
#> [[1]]
#> NULL
#>
#> [[2]]
#> NULL
# walk() returns the input — cleaner, signals intent
c("a", "b") |> walk(print)
#> [1] "a"
#> [1] "b"
The Invisible Return
walk() returns its input invisibly, which means it doesn’t print when used interactively. This is intentional — it lets you place walk() in a pipeline without generating distracting output:
x <- c("apple", "banana") |> walk(print) # input is assigned, nothing printed
x
#> [1] "apple" "banana"
To capture the return value explicitly:
y <- c("a", "b")
identical(y, y |> walk(print))
#> [1] "a"
#> [1] "b"
#> [1] TRUE
Combining walk() with safely() and quietly()
Wrap the side-effect function with safely() to catch errors without stopping the pipeline:
files <- c("data1.csv", "data2.csv", "nonexistent.csv")
files |>
walk(\(f) {
tryCatch(
read_csv(f) |> mutate(source = f),
error = \(e) warning("Failed to read ", f, ": ", e$message)
)
})
For functions that produce both output and messages or warnings you want to suppress:
walk(quietly(some_function), ~ .x$result)
See Also
- /reference/tidyverse/purrr-map/ — transform elements and return results
- /reference/tidyverse/purrr_map2/ — map over two inputs in parallel
- /guides/purrr-functional-programming/ — functional programming patterns with purrr including walk