tidyr::pivot_wider()
Overview
pivot_wider() is the inverse of pivot_longer(). It takes a key column and a value column, and spreads them across multiple columns. Where pivot_longer() makes wide data long, pivot_wider() makes long data wide.
It is useful when your data has one observation per row but the variable names are stored in a column rather than as column headers, the opposite of the structure pivot_longer() expects.
Signature
pivot_wider(
data,
names_from = NULL,
values_from = NULL,
id_cols = NULL,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_repair = "check_unique",
values_fill = NULL,
values_drop_na = FALSE,
...
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data | tibble / data frame | , | Input data. |
names_from | character | NULL | Column containing the values to use as column names. |
values_from | character | NULL | Column containing the values to fill the cells. |
id_cols | tidy-select | NULL | Columns that identify each row. Default: all columns not in names_from or values_from. |
names_prefix | character | "" | Prefix to prepend to column names. |
names_sep | character | "_" | Separator between multiple values in names_from. |
names_glue | character | NULL | Glue expression to construct column names from multiple columns. |
names_sort | logical | FALSE | Sort column names alphabetically. |
names_repair | character | "check_unique" | How to handle duplicate column names. |
values_fill | scalar | NULL | Value to use for missing combinations. |
values_drop_na | logical | FALSE | Drop rows where all value columns are NA. |
Basic usage
Simple spread
The most common case: one column identifies the observation, another provides the values.
library(tidyr)
df <- tibble(
person = c("Alice", "Alice", "Bob", "Bob"),
metric = c("height", "weight", "height", "weight"),
value = c(165, 58, 180, 82)
)
pivot_wider(df, names_from = metric, values_from = value)
# # A tibble: 2 × 3
# person height weight
# <chr> <dbl> <dbl>
# 1 Alice 165 58
# 2 Bob 180 82
Each unique value in metric becomes a column. value fills the cells.
Which columns identify rows?
By default, pivot_wider() uses every column that is not in names_from or values_from as an identifier. In the example above, person is the only id column. You can override this with id_cols: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
df2 <- tibble(
site = c("A", "A", "B", "B"),
year = c(2021, 2021, 2022, 2022),
metric = c("temp", "rain", "temp", "rain"),
value = c(22.1, 4.2, 23.5, 3.8)
)
pivot_wider(df2, id_cols = c(site, year), names_from = metric, values_from = value)
# # A tibble: 2 × 4
# site year temp rain
# <chr> <dbl> <dbl> <dbl>
# 1 A 2021 22.1 4.2
# 2 B 2022 23.5 3.8
Pivoting to wide format creates a new column for each unique value in the names_from column, which is useful for creating comparison tables, correlation matrices, or any presentation where side-by-side values matter. Use this approach when you need to prepare data for further analysis in a tidy workflow.
Multiple values columns
You can spread multiple value columns by passing a vector to values_from:
df <- tibble(
person = c("Alice", "Bob", "Carol"),
measure = c("temperature", "temperature", "temperature"),
morning = c(36.8, 37.1, 36.5),
evening = c(37.0, 37.3, 36.8)
)
pivot_wider(df, names_from = measure, values_from = c(morning, evening))
# # A tibble: 3 × 3
# person morning_temperature evening_temperature
# <chr> <dbl> <dbl>
# 1 Alice 36.8 37.0
# 2 Bob 37.1 37.3
# 3 Carol 36.5 36.8
When you provide multiple value columns, pivot_wider() constructs column names by combining names_from value with each value column name, separated by _ (or names_sep).
The values_fill argument provides a default value for missing combinations after pivoting, preventing NA values from appearing where a particular cross-tabulation cell had no data in the original long format. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
Filling missing values
When a combination does not exist in the input, the cell is NA:
df <- tibble(
person = c("Alice", "Alice", "Bob"),
metric = c("height", "weight", "height"),
value = c(165, 58, 180)
)
pivot_wider(df, names_from = metric, values_from = value)
# # A tibble: 2 × 3
# person height weight
# <chr> <dbl> <dbl>
# 1 Alice 165 58
# 2 Bob 180 NA
Use values_fill to replace NA with a specific value:
Pivoting to wide format creates a new column for each unique value in the names_from column, which is useful for creating comparison tables, correlation matrices, or any presentation where side-by-side values matter. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
pivot_wider(df, names_from = metric, values_from = value, values_fill = 0)
# # A tibble: 2 × 3
# person height weight
# <chr> <dbl> <dbl>
# 1 Alice 165 58
# 2 Bob 180 0
Custom column names
The values_fill argument provides a default value for missing combinations after pivoting, preventing NA values from appearing where a particular cross-tabulation cell had no data in the original long format. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
names_prefix
Prepend a string to every column name:
pivot_wider(df, names_from = metric, values_from = value, names_prefix = "obs_")
# # A tibble: 2 × 3
# person obs_height obs_weight
# <chr> <dbl> <dbl>
# 1 Alice 165 58
# 2 Bob 180 NA
Pivoting to wide format creates a new column for each unique value in the names_from column, which is useful for creating comparison tables, correlation matrices, or any presentation where side-by-side values matter. This pattern is common in real-world data analysis pipelines.
names_sep
When names_from has multiple columns, names_sep is the separator between their values:
df <- tibble(
id = c(1, 1, 2, 2),
loc = c("north", "south", "north", "south"),
value = c(10, 20, 15, 25)
)
pivot_wider(df, names_from = loc, values_from = value, names_sep = "_")
# # A tibble: 2 × 3
# id north south
# <dbl> <dbl> <dbl>
# 1 1 10 20
# 2 2 15 25
The values_fill argument provides a default value for missing combinations after pivoting, preventing NA values from appearing where a particular cross-tabulation cell had no data in the original long format. Applying this technique correctly saves time in data preparation. Wide format tables are often easier to read and interpret than long format listings.
names_glue
For full control over the name construction, use a glue expression:
pivot_wider(
df,
names_from = loc,
values_from = value,
names_glue = "{loc}__{.value}"
)
# # A tibble: 2 × 3
# id north__value south__value
# <dbl> <dbl> <dbl>
# 1 1 10 20
# 2 2 15 25
Gotchas and edge cases
Pivoting to wide format creates a new column for each unique value in the names_from column, which is useful for creating comparison tables, correlation matrices, or any presentation where side-by-side values matter. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
Duplicate column names
If the same column name would be produced for different source rows, pivot_wider() creates a list column (a column of vectors) rather than overwriting: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
df <- tibble(
person = c("Alice", "Alice"),
metric = c("height", "height"),
value = c(165, 170)
)
pivot_wider(df, names_from = metric, values_from = value)
# Warning: Values are not uniquely identified; output has 2 rows
# # A tibble: 1 × 2
# person height
# <chr> <list>
# 1 Alice <int [2]>
The list column stores both values. Use dplyr::unnest() if you want one row per value.
The values_fill argument provides a default value for missing combinations after pivoting, preventing NA values from appearing where a particular cross-tabulation cell had no data in the original long format. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.
Sorting column names
By default, columns appear in the order they are encountered. Set names_sort = TRUE to sort alphabetically:
df <- tibble(
person = c("Alice", "Bob"),
z_metric = c(10, 20),
a_metric = c(5, 15)
)
pivot_wider(pivot_longer(df, names_to = "metric", values_to = "value"),
names_from = metric, values_from = value, names_sort = TRUE)
Pivoting to wide format creates a new column for each unique value in the names_from column, which is useful for creating comparison tables, correlation matrices, or any presentation where side-by-side values matter. Knowing when to use each variant improves your data cleaning efficiency. This approach is standard when preparing reports or tables from tidy data.
names_repair options
If column names are invalid or duplicated, use names_repair:
pivot_wider(df, names_from = metric, values_from = value, names_repair = "unique")
# Creates unique names like "height...1", "height...2"
pivot_wider(df, names_from = metric, values_from = value, names_repair = "minimal")
# Silently renames to avoid duplicates (use with care)
pivot_wider() takes the names_from column values and turns them into new column names. When multiple rows match the same key-name combination, you must supply a values_fn (e.g., sum, mean) to aggregate; otherwise it throws an error. Use names_prefix to prepend a string to all generated column names, which prevents names starting with digits.
See also
- /reference/tidyverse/tidyr-pivot-longer/, the inverse operation, converting wide data to long format
- /reference/tidyverse/dplyr-mutate/, often used after
pivot_widerto transform the new columns - /cookbooks/how-to-convert-wide-to-long/ — step-by-step examples converting between wide and long formats