rguides

tidyr::pivot_longer()

Overview

pivot_longer() transforms wide data into long format. It takes two or more columns and gathers them into key-value pairs: column names land in a new column, and their values land in another. This is the inverse of pivot_wider().

The function is useful when your data has columns that represent values rather than variables, a common structure in survey data, repeated measurements, and any dataset where the column header encodes information.

Signature

pivot_longer(
  data,
  cols,
  names_to = "name",
  values_to = "value",
  names_prefix = NULL,
  names_sep = NULL,
  names_pattern = NULL,
  names_ptypes = NULL,
  values_ptypes = NULL,
  values_drop_na = FALSE,
  ...
)

Parameters

ParameterTypeDefaultDescription
datatibble / data frame,Input data.
colstidy-select,Columns to pivot into longer format.
names_tocharacter"name"Name of the column that will hold the old column names.
values_tocharacter"value"Name of the column that will hold the cell values.
names_prefixcharacterNULLStrip this prefix from column names before pivoting.
names_sepcharacterNULLSeparator between multiple names_to columns.
names_patterncharacterNULLRegex pattern to extract column name parts for multiple names_to.
names_ptypesnamed listNULLTypes for the new name columns.
values_ptypesnamed listNULLTypes for the value columns.
values_drop_nalogicalFALSEIf TRUE, drop rows where all value columns are NA.

Basic usage

Simple pivot

The most common case: you have columns that each represent a category, and you want to stack them.

library(tidyr)

df <- tibble(
  id = c("A", "B"),
  Jan = c(100, 200),
  Feb = c(150, 250),
  Mar = c(120, 220)
)

pivot_longer(df, c(Jan, Feb, Mar), names_to = "month", values_to = "sales")
# # A tibble: 6 × 3
#   id    month sales
#   <chr> <chr> <dbl>
# 1 A     Jan    100
# 2 A     Feb    150
# 3 A     Mar    120
# 4 B     Jan    200
# 5 B     Feb    250
# 6 B     Mar    220

When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. Use this approach when you need to prepare data for further analysis in a tidy workflow.

Pivot all columns except identifier

Use the ! operator to pivot all columns except the identifier:

df <- tibble(
  id = c("A", "B"),
  Jan = c(100, 200),
  Feb = c(150, 250)
)

pivot_longer(df, !id, names_to = "month", values_to = "sales")
# # A tibble: 4 × 3
#   id    month sales
#   <chr> <chr> <dbl>
# 1 A     Jan    100
# 2 A     Feb    150
# 3 B     Jan    200
# 4 B     Feb    250

The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. This pattern is common in real-world data analysis pipelines. Reshaping data correctly is often the most important preprocessing step in any analysis.

Strip prefixes from column names

Use names_prefix to remove a common prefix from column names before pivoting:

df <- tibble(
  id = c(1, 2),
  product_2023 = c(300, 450),
  product_2024 = c(320, 480)
)

pivot_longer(
  df,
  c(product_2023, product_2024),
  names_to = "year",
  values_to = "revenue",
  names_prefix = "product_"
)
# # A tibble: 4 × 3
#   id    year  revenue
#   <dbl> <chr>   <dbl>
# 1     1 2023      300
# 2     1 2024      320
# 3     2 2023      450
# 4     2 2024      480

Multiple name columns

When column names encode multiple variables, split them into separate columns with names_sep or names_pattern.

When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Using names_sep

When column names follow a consistent separator, names_sep splits on that character:

df <- tibble(
  id = c(1, 2),
  temp_2021 = c(22.1, 18.5),
  temp_2022 = c(23.4, 19.2)
)

pivot_longer(
  df,
  c(temp_2021, temp_2022),
  names_to = c("var", "year"),
  values_to = "reading",
  names_sep = "_"
)
# # A tibble: 4 × 4
#   id    var   year  reading
#   <dbl> <chr> <chr>   <dbl>
# 1     1 temp  2021     22.1
# 2     1 temp  2022     23.4
# 3     2 temp  2021     18.5
# 4     2 temp  2022     19.2

The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. Applying this technique correctly saves time in data preparation. Reshaping data correctly is often the most important preprocessing step in any analysis.

Using names_pattern

For irregular column name structures, use a regex with capture groups:

df <- tibble(
  id = c(1, 2),
  Alice_score = c(85, 92),
  Bob_score = c(78, 88)
)

pivot_longer(
  df,
  c(Alice_score, Bob_score),
  names_to = "student",
  values_to = "score",
  names_pattern = "(.+)_.+"
)
# # A tibble: 4 × 3
#   id    student score
#   <dbl> <chr>   <dbl>
# 1     1 Alice      85
# 2     1 Bob        78
# 3     2 Alice      92
# 4     2 Bob        88

For two capture groups:

When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

df <- tibble(
  site = c("A", "B"),
  alpha_2021 = c(10, 20),
  beta_2021 = c(15, 25)
)

pivot_longer(
  df,
  c(alpha_2021, beta_2021),
  names_to = c("site_code", "year"),
  values_to = "measurement",
  names_pattern = "(.+)_(.+)"
)
# # A tibble: 4 × 4
#   site  site_code year  measurement
#   <chr> <chr>     <chr>        <dbl>
# 1 A     alpha     2021            10
# 2 A     beta      2021            15
# 3 B     alpha     2021            20
# 4 B     beta      2021            25

The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. Knowing when to use each variant improves your data cleaning efficiency. Reshaping data correctly is often the most important preprocessing step in any analysis.

Multiple value columns

By default, all pivoted columns share the same values_to column. To split values across multiple columns, provide a named vector with .value as a placeholder: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

df <- tibble(
  id = c(1, 2),
  group_low_n = c(10, 20),
  group_high_n = c(5, 8),
  group_low_mean = c(2.1, 3.4),
  group_high_mean = c(5.6, 6.2)
)

pivot_longer(
  df,
  c(group_low_n, group_high_n, group_low_mean, group_high_mean),
  names_to = c("group", ".value"),
  values_to = c("n", "mean"),
  names_pattern = "group_(.+)_(.+)"
)
# # A tibble: 4 × 4
#   id    group     n     mean
#   <dbl> <chr> <dbl>   <dbl>
# 1     1 low       10     2.1
# 2     1 high       5     5.6
# 3     2 low       20     3.4
# 4     2 high       8     6.2

The .value placeholder tells pivot_longer to create separate value columns from the matched groups.

Gotchas and edge cases

When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

values_drop_na

Setting values_drop_na = TRUE removes rows where the pivoted values are all NA. This is useful after a join or for sparse data: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

df <- tibble(
  id = c(1, 2),
  x = c(1, NA),
  y = c(NA, 2)
)

pivot_longer(df, c(x, y), names_to = "var", values_to = "val", values_drop_na = TRUE)
# # A tibble: 2 × 3
#   id    var     val
#   <dbl> <chr> <dbl>
# 1     1 x         1
# 2     2 y         2

The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. Use this approach when you need to prepare data for further analysis in a tidy workflow.

NA in original data

If a cell in the original data is NA, that NA propagates to the pivoted row:

df <- tibble(id = c("A", "B"), x = c(1, NA), y = c(NA, 4))

pivot_longer(df, c(x, y), names_to = "col", values_to = "val")
# # A tibble: 4 × 3
#   id    col     val
#   <chr> <chr> <dbl>
# 1 A     x         1
# 2 A     y        NA
# 3 B     x        NA
# 4 B     y         4

When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This pattern is common in real-world data analysis pipelines. This technique is fundamental to reshaping data for tidy analysis.

Specifying column types

Use names_ptypes and values_ptypes to enforce types on the new columns:

df <- tibble(
  id = c(1, 2),
  x_2021 = c("1", "2"),
  x_2022 = c("3", "4")
)

pivot_longer(
  df,
  c(x_2021, x_2022),
  names_to = "year",
  values_to = "val",
  names_ptypes = list(year = factor()),
  values_ptypes = list(val = double())
)

pivot_longer() is the replacement for gather() from older tidyr. The cols argument uses tidy-select syntax, starts_with("q") selects columns starting with “q”. The names_to column holds the original column names as a character vector; values_to holds the values. When column names encode multiple variables (e.g., type_year), use names_sep or names_pattern to split them into multiple names_to columns.

See also