rguides

tidyr::nest()

Overview

nest() creates a list-column: a column that holds data frames instead of individual values. It is the inverse of unnest(). Nesting is implicitly a summarising operation, you get one row for each unique combination of the non-nested columns.

Nested data is useful when each row represents a grouping unit and you want to store the related rows for each group inside that row. Once nested, you can apply any transformation to each group using mutate() and map().

nest() is part of tidyr. In tidyr 1.0 and later, the function uses a new syntax with name-variable pairs like nest(data = c(col1, col2)).

Signature

nest(.data, ..., .by = NULL, .key = NULL, .names_sep = NULL)

Parameters

ParameterTypeDefaultDescription
.datatibble / data frame,Input data.
...Columns to nest, specified as name = c(col1, col2). Unnamed columns use the tidyselect interface. Columns not named remain as identifiers.
.byNULLColumns to nest by, these stay in the outer data frame. Alternative to group_by().
.keycharacter"data"Name of the new nested column. Only used when ... is not specified.
.names_sepcharacterNULLIf non-NULL, strips the outer column name from inner names using that separator.

Basic usage

Using .by

The tidyr 1.0 .by argument lets you nest by columns without calling group_by() first:

library(tidyr)

df <- tibble(
  group = c("A", "A", "B", "B", "B"),
  x     = c(1, 2, 3, 4, 5),
  y     = c(10, 20, 30, 40, 50)
)

nested <- df %>%
  nest(.by = group)

nested
# # A tibble: 2 x 2
#   group data
#   <chr> <list>
# 1 A     <tibble [2 x 2]>
# 2 B     <tibble [3 x 2]>

This is equivalent to df %>% group_by(group) %>% nest(), but avoids the explicit group_by() call.

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Nesting specific columns

Use name = c(col1, col2) syntax to name the columns to nest. Columns not named stay as identifiers:

df <- tibble(
  id    = c(1, 1, 2, 2),
  year  = c(2020, 2021, 2020, 2021),
  value = c(100, 110, 200, 220)
)

df %>% nest(data = c(year, value))
# # A tibble: 2 x 2
#      id data
#   <dbl> <list>
# 1     1 <tibble [2 x 2]>
# 2     2 <tibble [2 x 2]>

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. Use this approach when you need to prepare data for further analysis in a tidy workflow.

Nesting all columns

With neither ... nor .by, nest() nests all columns using the .key name:

df <- tibble(
  group = c("A", "A", "B"),
  x     = c(1, 2, 3)
)

df %>% nest()
# # A tibble: 2 x 2
#   group data
#   <chr> <list>
# 1 A     <tibble [2 x 1]>
# 2 B     <tibble [1 x 1]>

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. This pattern is common in real-world data analysis pipelines.

Using .names_sep

When nesting by multiple columns, .names_sep strips the outer column prefix from inner names:

df <- tibble(
  id = c(1, 1),
  a_x = c(1, 2),
  a_y = c(3, 4)
)

# Without .names_sep: inner names stay as a_x, a_y
df %>% nest(data = starts_with("a"), .by = id)

# With .names_sep: inner names become x, y
df %>% nest(data = starts_with("a"), .by = id, .names_sep = "_")

Working with nested data

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Fitting models per group

Nest then use mutate() with map() from purrr to fit a separate model per group:

library(purrr)

nested_models <- nested %>%
  mutate(model = map(data, ~ lm(x ~ y, data = .x)))

nested_models
# # A tibble: 2 x 3
#   group data             model
#   <chr> <list>          <list>
# 1 A     <tibble [2 x 2]> <lm>
# 2 B     <tibble [3 x 2]> <lm>

Extract coefficients:

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

nested_models %>%
  mutate(coef = map(model, coef))
# # A tibble: 2 x 4
#   group data             model  coef
#   <chr> <list>          <list> <list>
# 1 A     <tibble [2 x 2]> <lm>   <dbl [2]>
# 2 B     <tibble [3 x 2]> <lm>   <dbl [2]>

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. Applying this technique correctly saves time in data preparation.

Summarising per group

nested %>%
  mutate(
    mean_x = map_dbl(data, ~ mean(.x$x)),
    sum_y  = map_dbl(data, ~ sum(.x$y))
  )
# # A tibble: 2 x 4
#   group data              mean_x  sum_y
#   <chr> <list>           <dbl>  <dbl>
# 1 A     <tibble [2 x 2]>   1.5     30
# 2 B     <tibble [3 x 2]>   4       120

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. Knowing when to use each variant improves your data cleaning efficiency.

Unnesting

unnest() expands the list-column back into regular columns:

nested %>%
  unnest(data)
# # A tibble: 5 x 3
#   group     x     y
#   <chr> <dbl> <dbl>
# 1 A         1    10
# 2 A         2    20
# 3 B         3    30
# 4 B         4    40
# 5 B         5    50

Row and column ordering may differ after unnesting. Use dplyr::arrange() to restore the original order.

Common use cases

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Time series per entity

library(lubridate)

measurements <- tibble(
  station = c(rep("A", 3), rep("B", 3)),
  date = rep(as.Date(c("2021-01-01", "2021-01-02", "2021-01-03")), 2),
  temp = c(5.1, 5.8, 6.2, 4.2, 4.5, 4.9)
)

measurements %>%
  nest(.by = station) %>%
  mutate(models = map(data, ~ lm(temp ~ date, data = .x)))

After nesting, each group becomes a single row with a list-column containing the original data for that group, which is compact and convenient for storing grouped summaries alongside the raw data. Use this approach when you need to prepare data for further analysis in a tidy workflow.

Multiple measurements per respondent

survey <- tibble(
  respondent = c(1, 2, 3),
  q1 = c("agree", "disagree", "neutral"),
  q2 = c("agree", "agree", "disagree"),
  q3 = c("neutral", "agree", "agree")
)

survey %>%
  nest(responses = c(q1, q2, q3))
# # A tibble: 3 x 2
#   respondent responses
#        <dbl> <list>
# 1          1 <tibble [1 x 3]>
# 2          2 <tibble [1 x 3]>
# 3          3 <tibble [1 x 3]>

Gotchas

Without group_by() or .by, each row becomes its own tibble. df %>% nest(data = c(x, y)) nests by row, each row in the input becomes one row in the output with a single-row tibble inside. Use .by or group_by() to nest multiple rows together:

# Each row nests by itself
df %>% nest(data = c(x, y))        # one-row tibbles inside

# All rows per group nest together
df %>% nest(.by = group)           # multiple rows per group
df %>% group_by(group) %>% nest()  # equivalent

The old nest(x, y, z) syntax is deprecated. If you see code using nest(col1, col2, col3) without the = sign, it is the pre-1.0 syntax. Convert it to nest(data = c(col1, col2, col3)). This pattern is common in real-world data analysis pipelines. Using nested data structures keeps grouped data organized and accessible in a compact format.

Nested data frames do not print their contents. The list-column shows <list> instead of values. Use str() to inspect structure, or pull individual elements: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

nested <- df %>% nest(.by = group)
str(nested)
# tibble [2 x 2]
#  $ group: chr [1:2] "A" "B"
#  $ data :List of 2
#   ..$ : tibble [2 x 2]
#   ..$ : tibble [3 x 2]

.key is only used when ... is absent. If you use nest(data = c(...)), the name "data" comes from your assignment, not from .key. The .key argument only applies in the nest(.by = x) form.

See also