tidyr::pivot_longer()
Overview
pivot_longer() transforms wide data into long format. It takes two or more columns and gathers them into key-value pairs: column names land in a new column, and their values land in another. This is the inverse of pivot_wider().
The function is useful when your data has columns that represent values rather than variables, a common structure in survey data, repeated measurements, and any dataset where the column header encodes information.
Signature
pivot_longer(
data,
cols,
names_to = "name",
values_to = "value",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
values_ptypes = NULL,
values_drop_na = FALSE,
...
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data | tibble / data frame | , | Input data. |
cols | tidy-select | , | Columns to pivot into longer format. |
names_to | character | "name" | Name of the column that will hold the old column names. |
values_to | character | "value" | Name of the column that will hold the cell values. |
names_prefix | character | NULL | Strip this prefix from column names before pivoting. |
names_sep | character | NULL | Separator between multiple names_to columns. |
names_pattern | character | NULL | Regex pattern to extract column name parts for multiple names_to. |
names_ptypes | named list | NULL | Types for the new name columns. |
values_ptypes | named list | NULL | Types for the value columns. |
values_drop_na | logical | FALSE | If TRUE, drop rows where all value columns are NA. |
Basic usage
Simple pivot
The most common case: you have columns that each represent a category, and you want to stack them.
library(tidyr)
df <- tibble(
id = c("A", "B"),
Jan = c(100, 200),
Feb = c(150, 250),
Mar = c(120, 220)
)
pivot_longer(df, c(Jan, Feb, Mar), names_to = "month", values_to = "sales")
# # A tibble: 6 × 3
# id month sales
# <chr> <chr> <dbl>
# 1 A Jan 100
# 2 A Feb 150
# 3 A Mar 120
# 4 B Jan 200
# 5 B Feb 250
# 6 B Mar 220
When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. Use this approach when you need to prepare data for further analysis in a tidy workflow.
Pivot all columns except identifier
Use the ! operator to pivot all columns except the identifier:
df <- tibble(
id = c("A", "B"),
Jan = c(100, 200),
Feb = c(150, 250)
)
pivot_longer(df, !id, names_to = "month", values_to = "sales")
# # A tibble: 4 × 3
# id month sales
# <chr> <chr> <dbl>
# 1 A Jan 100
# 2 A Feb 150
# 3 B Jan 200
# 4 B Feb 250
The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. This pattern is common in real-world data analysis pipelines. Reshaping data correctly is often the most important preprocessing step in any analysis.
Strip prefixes from column names
Use names_prefix to remove a common prefix from column names before pivoting:
df <- tibble(
id = c(1, 2),
product_2023 = c(300, 450),
product_2024 = c(320, 480)
)
pivot_longer(
df,
c(product_2023, product_2024),
names_to = "year",
values_to = "revenue",
names_prefix = "product_"
)
# # A tibble: 4 × 3
# id year revenue
# <dbl> <chr> <dbl>
# 1 1 2023 300
# 2 1 2024 320
# 3 2 2023 450
# 4 2 2024 480
Multiple name columns
When column names encode multiple variables, split them into separate columns with names_sep or names_pattern.
When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
Using names_sep
When column names follow a consistent separator, names_sep splits on that character:
df <- tibble(
id = c(1, 2),
temp_2021 = c(22.1, 18.5),
temp_2022 = c(23.4, 19.2)
)
pivot_longer(
df,
c(temp_2021, temp_2022),
names_to = c("var", "year"),
values_to = "reading",
names_sep = "_"
)
# # A tibble: 4 × 4
# id var year reading
# <dbl> <chr> <chr> <dbl>
# 1 1 temp 2021 22.1
# 2 1 temp 2022 23.4
# 3 2 temp 2021 18.5
# 4 2 temp 2022 19.2
The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. Applying this technique correctly saves time in data preparation. Reshaping data correctly is often the most important preprocessing step in any analysis.
Using names_pattern
For irregular column name structures, use a regex with capture groups:
df <- tibble(
id = c(1, 2),
Alice_score = c(85, 92),
Bob_score = c(78, 88)
)
pivot_longer(
df,
c(Alice_score, Bob_score),
names_to = "student",
values_to = "score",
names_pattern = "(.+)_.+"
)
# # A tibble: 4 × 3
# id student score
# <dbl> <chr> <dbl>
# 1 1 Alice 85
# 2 1 Bob 78
# 3 2 Alice 92
# 4 2 Bob 88
For two capture groups:
When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
df <- tibble(
site = c("A", "B"),
alpha_2021 = c(10, 20),
beta_2021 = c(15, 25)
)
pivot_longer(
df,
c(alpha_2021, beta_2021),
names_to = c("site_code", "year"),
values_to = "measurement",
names_pattern = "(.+)_(.+)"
)
# # A tibble: 4 × 4
# site site_code year measurement
# <chr> <chr> <chr> <dbl>
# 1 A alpha 2021 10
# 2 A beta 2021 15
# 3 B alpha 2021 20
# 4 B beta 2021 25
The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. Knowing when to use each variant improves your data cleaning efficiency. Reshaping data correctly is often the most important preprocessing step in any analysis.
Multiple value columns
By default, all pivoted columns share the same values_to column. To split values across multiple columns, provide a named vector with .value as a placeholder: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
df <- tibble(
id = c(1, 2),
group_low_n = c(10, 20),
group_high_n = c(5, 8),
group_low_mean = c(2.1, 3.4),
group_high_mean = c(5.6, 6.2)
)
pivot_longer(
df,
c(group_low_n, group_high_n, group_low_mean, group_high_mean),
names_to = c("group", ".value"),
values_to = c("n", "mean"),
names_pattern = "group_(.+)_(.+)"
)
# # A tibble: 4 × 4
# id group n mean
# <dbl> <chr> <dbl> <dbl>
# 1 1 low 10 2.1
# 2 1 high 5 5.6
# 3 2 low 20 3.4
# 4 2 high 8 6.2
The .value placeholder tells pivot_longer to create separate value columns from the matched groups.
Gotchas and edge cases
When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
values_drop_na
Setting values_drop_na = TRUE removes rows where the pivoted values are all NA. This is useful after a join or for sparse data: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
df <- tibble(
id = c(1, 2),
x = c(1, NA),
y = c(NA, 2)
)
pivot_longer(df, c(x, y), names_to = "var", values_to = "val", values_drop_na = TRUE)
# # A tibble: 2 × 3
# id var val
# <dbl> <chr> <dbl>
# 1 1 x 1
# 2 2 y 2
The names_to argument receives the old column names and values_to receives the cell values, creating a tidy representation where each variable forms a column and each observation forms a row. Use this approach when you need to prepare data for further analysis in a tidy workflow.
NA in original data
If a cell in the original data is NA, that NA propagates to the pivoted row:
df <- tibble(id = c("A", "B"), x = c(1, NA), y = c(NA, 4))
pivot_longer(df, c(x, y), names_to = "col", values_to = "val")
# # A tibble: 4 × 3
# id col val
# <chr> <chr> <dbl>
# 1 A x 1
# 2 A y NA
# 3 B x NA
# 4 B y 4
When you pivot a wide dataset to long format with pivot_longer(), each original column contributes one row per observation, making the data easier to use with grouped operations and faceted plots in ggplot2. This pattern is common in real-world data analysis pipelines. This technique is fundamental to reshaping data for tidy analysis.
Specifying column types
Use names_ptypes and values_ptypes to enforce types on the new columns:
df <- tibble(
id = c(1, 2),
x_2021 = c("1", "2"),
x_2022 = c("3", "4")
)
pivot_longer(
df,
c(x_2021, x_2022),
names_to = "year",
values_to = "val",
names_ptypes = list(year = factor()),
values_ptypes = list(val = double())
)
pivot_longer() is the replacement for gather() from older tidyr. The cols argument uses tidy-select syntax, starts_with("q") selects columns starting with “q”. The names_to column holds the original column names as a character vector; values_to holds the values. When column names encode multiple variables (e.g., type_year), use names_sep or names_pattern to split them into multiple names_to columns.
See also
- /reference/tidyverse/tidyr-pivot/, the inverse operation (
pivot_wider) for widening data back out - /reference/tidyverse/dplyr-mutate/, often used after
pivot_longerto transform the new columns - /cookbooks/how-to-convert-wide-to-long/ — step-by-step example of reshaping wide data to tidy format