tidyr::nest()
Overview
nest() creates a list-column: a column that holds data frames instead of individual values. It is the inverse of unnest(). Nesting is implicitly a summarising operation — you get one row for each unique combination of the non-nested columns.
Nested data is useful when each row represents a grouping unit and you want to store the related rows for each group inside that row. Once nested, you can apply any transformation to each group using mutate() and map().
nest() is part of tidyr. In tidyr 1.0 and later, the function uses a new syntax with name-variable pairs like nest(data = c(col1, col2)).
Signature
nest(.data, ..., .by = NULL, .key = NULL, .names_sep = NULL)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
.data | tibble / data frame | — | Input data. |
... | Columns to nest, specified as name = c(col1, col2). Unnamed columns use the tidyselect interface. Columns not named remain as identifiers. | ||
.by | NULL | Columns to nest by — these stay in the outer data frame. Alternative to group_by(). | |
.key | character | "data" | Name of the new nested column. Only used when ... is not specified. |
.names_sep | character | NULL | If non-NULL, strips the outer column name from inner names using that separator. |
Basic Usage
Using .by
The tidyr 1.0 .by argument lets you nest by columns without calling group_by() first:
library(tidyr)
df <- tibble(
group = c("A", "A", "B", "B", "B"),
x = c(1, 2, 3, 4, 5),
y = c(10, 20, 30, 40, 50)
)
nested <- df %>%
nest(.by = group)
nested
# # A tibble: 2 x 2
# group data
# <chr> <list>
# 1 A <tibble [2 x 2]>
# 2 B <tibble [3 x 2]>
This is equivalent to df %>% group_by(group) %>% nest(), but avoids the explicit group_by() call.
Nesting specific columns
Use name = c(col1, col2) syntax to name the columns to nest. Columns not named stay as identifiers:
df <- tibble(
id = c(1, 1, 2, 2),
year = c(2020, 2021, 2020, 2021),
value = c(100, 110, 200, 220)
)
df %>% nest(data = c(year, value))
# # A tibble: 2 x 2
# id data
# <dbl> <list>
# 1 1 <tibble [2 x 2]>
# 2 2 <tibble [2 x 2]>
Nesting all columns
With neither ... nor .by, nest() nests all columns using the .key name:
df <- tibble(
group = c("A", "A", "B"),
x = c(1, 2, 3)
)
df %>% nest()
# # A tibble: 2 x 2
# group data
# <chr> <list>
# 1 A <tibble [2 x 1]>
# 2 B <tibble [1 x 1]>
Using .names_sep
When nesting by multiple columns, .names_sep strips the outer column prefix from inner names:
df <- tibble(
id = c(1, 1),
a_x = c(1, 2),
a_y = c(3, 4)
)
# Without .names_sep: inner names stay as a_x, a_y
df %>% nest(data = starts_with("a"), .by = id)
# With .names_sep: inner names become x, y
df %>% nest(data = starts_with("a"), .by = id, .names_sep = "_")
Working with Nested Data
Fitting models per group
Nest then use mutate() with map() from purrr to fit a separate model per group:
library(purrr)
nested_models <- nested %>%
mutate(model = map(data, ~ lm(x ~ y, data = .x)))
nested_models
# # A tibble: 2 x 3
# group data model
# <chr> <list> <list>
# 1 A <tibble [2 x 2]> <lm>
# 2 B <tibble [3 x 2]> <lm>
Extract coefficients:
nested_models %>%
mutate(coef = map(model, coef))
# # A tibble: 2 x 4
# group data model coef
# <chr> <list> <list> <list>
# 1 A <tibble [2 x 2]> <lm> <dbl [2]>
# 2 B <tibble [3 x 2]> <lm> <dbl [2]>
Summarising per group
nested %>%
mutate(
mean_x = map_dbl(data, ~ mean(.x$x)),
sum_y = map_dbl(data, ~ sum(.x$y))
)
# # A tibble: 2 x 4
# group data mean_x sum_y
# <chr> <list> <dbl> <dbl>
# 1 A <tibble [2 x 2]> 1.5 30
# 2 B <tibble [3 x 2]> 4 120
Unnesting
unnest() expands the list-column back into regular columns:
nested %>%
unnest(data)
# # A tibble: 5 x 3
# group x y
# <chr> <dbl> <dbl>
# 1 A 1 10
# 2 A 2 20
# 3 B 3 30
# 4 B 4 40
# 5 B 5 50
Row and column ordering may differ after unnesting. Use dplyr::arrange() to restore the original order.
Common Use Cases
Time series per entity
library(lubridate)
measurements <- tibble(
station = c(rep("A", 3), rep("B", 3)),
date = rep(as.Date(c("2021-01-01", "2021-01-02", "2021-01-03")), 2),
temp = c(5.1, 5.8, 6.2, 4.2, 4.5, 4.9)
)
measurements %>%
nest(.by = station) %>%
mutate(models = map(data, ~ lm(temp ~ date, data = .x)))
Multiple measurements per respondent
survey <- tibble(
respondent = c(1, 2, 3),
q1 = c("agree", "disagree", "neutral"),
q2 = c("agree", "agree", "disagree"),
q3 = c("neutral", "agree", "agree")
)
survey %>%
nest(responses = c(q1, q2, q3))
# # A tibble: 3 x 2
# respondent responses
# <dbl> <list>
# 1 1 <tibble [1 x 3]>
# 2 2 <tibble [1 x 3]>
# 3 3 <tibble [1 x 3]>
Gotchas
Without group_by() or .by, each row becomes its own tibble. df %>% nest(data = c(x, y)) nests by row — each row in the input becomes one row in the output with a single-row tibble inside. Use .by or group_by() to nest multiple rows together:
# Each row nests by itself
df %>% nest(data = c(x, y)) # one-row tibbles inside
# All rows per group nest together
df %>% nest(.by = group) # multiple rows per group
df %>% group_by(group) %>% nest() # equivalent
The old nest(x, y, z) syntax is deprecated. If you see code using nest(col1, col2, col3) without the = sign, it is the pre-1.0 syntax. Convert it to nest(data = c(col1, col2, col3)).
Nested data frames do not print their contents. The list-column shows <list> instead of values. Use str() to inspect structure, or pull individual elements:
nested <- df %>% nest(.by = group)
str(nested)
# tibble [2 x 2]
# $ group: chr [1:2] "A" "B"
# $ data :List of 2
# ..$ : tibble [2 x 2]
# ..$ : tibble [3 x 2]
.key is only used when ... is absent. If you use nest(data = c(...)), the name "data" comes from your assignment, not from .key. The .key argument only applies in the nest(.by = x) form.
See Also
- /reference/tidyverse/tidyr_pivot_longer/ — the inverse of nesting, converts wide data to long format
- /reference/tidyverse/tidyr_pivot_wider/ — spreads key-value pairs across columns (related reshaping)
- /reference/tidyverse/dplyr-mutate/ — add and transform columns, often used after unnesting