Pivoting Data with tidyr
Most R functions that analyze data expect it in a specific shape. ggplot2 wants long format, where each row is a single observation. Statistical models often want wide format, where each column is a variable and each row is a subject. The problem is that real data arrives in whatever shape the person who created it thought was sensible. tidyr gives you two functions that handle this mismatch cleanly: pivot_longer() converts wide data into long format, and pivot_wider() does the reverse.
Wide vs Long Format
The difference comes down to how you organize observations and variables across rows and columns.
Wide format stores related measurements as separate columns. A clinical trial might look like this in wide format: one row per patient, with columns for treatment_a_result, treatment_b_result, and treatment_c_result. This is readable when you’re eyeballing the data, and it matches how spreadsheets usually look.
Long format stores each measurement as its own row. The same clinical trial becomes: one row per patient per treatment, with a treatment column and a result column. This is verbose but consistent. Every measurement follows the same structure, which is exactly what ggplot2 and most statistical models need.
Neither format is universally better. Wide data takes fewer rows and shows values directly. Long data is repetitive but predictable. You will constantly switch between them depending on what operation comes next.
Reshaping with pivot_longer
pivot_longer() gathers columns into rows. You tell it which columns to gather, what to call the new key column, and what to call the new value column.
library(tidyr)
library(dplyr)
# A simple wide dataset: measurement results across three time points
wide <- tibble(
subject = c("Alice", "Bob", "Carol"),
baseline = c(100, 95, 102),
week_4 = c(92, 88, 95),
week_8 = c(85, 82, 88)
)
wide
## # A tibble: 3 × 4
## subject baseline week_4 week_8
## <chr> <dbl> <dbl> <dbl>
## 1 Alice 100 92 85
## 2 Bob 95 88 82
## 3 Carol 102 95 88
long <- wide |>
pivot_longer(
cols = c(baseline, week_4, week_8),
names_to = "timepoint",
values_to = "result"
)
long
## # A tibble: 9 × 3
## subject timepoint result
## <chr> <chr> <dbl>
## 1 Alice baseline 100
## 2 Alice week_4 92
## 3 Alice week_8 85
## 4 Bob baseline 95
## 5 Bob week_4 88
## 6 Bob week_8 82
## 7 Carol baseline 102
## 8 Carol week_4 95
## 9 Carol week_8 88
The subject column was left untouched because it is not part of cols. That column acts as an identifier. Everything in cols gets gathered into two new columns.
You can use tidyselect helpers to avoid typing column names. starts_with("week"), contains("result"), or everything() except the identifier columns all work.
Splitting Column Names with names_sep
When your column names contain multiple pieces of information, you can split them. Suppose your columns are result_week_4 and result_week_8. The underscore separates the metric name from the timepoint, and you want both pieces in separate columns.
compound <- tibble(
subject = c("Alice", "Bob"),
result_week_4 = c(92, 88),
result_week_8 = c(85, 82)
)
compound |>
pivot_longer(
cols = starts_with("result_"),
names_sep = "_",
names_to = c("metric", "timepoint")
)
```r
```r
## # A tibble: 4 × 4
## subject metric timepoint value
## <chr> <chr> <chr> <dbl>
## 1 Alice result week_4 92
## 2 Alice result week_8 85
## 3 Bob result week_4 88
## 4 Bob result week_8 82
The names_sep argument tells tidyr where to break the column name. You provide a character vector for names_to with the same length as the number of pieces. Each piece lands in its own column.
Using .value for Selective Column Naming
The special value .value in names_to tells tidyr something different: the column name itself is a variable name, and its values should become column names in the output. This is how you selectively promote values from column names.
With the iris dataset, each measurement column name like Sepal.Length already contains the variable name embedded in the dot. You want those variable names to become column headers.
iris |>
tibble::rownames_to_column("Species") |>
pivot_longer(
cols = !Species,
names_to = ".value",
names_sep = "\\."
)
```r
That is not quite right. When `.value` is used, the column names get split by `names_sep` (or `names_pattern`), and each unique prefix becomes a column. The `values_to` column receives the actual values. In the iris case, the dot in `Sepal.Length` acts as the separator, so `names_sep = "\\."` would work correctly.
The key insight is that `.value` is not magic. It is a signal that the column name structure encodes variable names. You still need `names_sep` or `names_pattern` to tell tidyr how to extract them.
## Reshaping with pivot_wider
`pivot_wider()` is the inverse operation. You specify which column holds the new column names and which column holds the values.
```r
long |>
pivot_wider(
names_from = timepoint,
values_from = result
)
```r
```r
## # A tibble: 3 × 4
## subject baseline week_4 week_8
## <chr> <dbl> <dbl> <dbl>
## 1 Alice 100 92 85
## 2 Bob 95 88 82
## 3 Carol 102 95 88
That gets you back to the original wide format.
Custom Column Names with names_glue
When names_from contains multiple columns, you can combine them with names_glue. Suppose your long data has separate metric and unit columns, and you want the new column names to combine both.
combined <- tibble(
subject = c("Alice", "Alice", "Bob", "Bob"),
metric = c("weight", "height", "weight", "height"),
unit = c("kg", "cm", "kg", "cm"),
value = c(62, 170, 78, 175)
)
combined |>
pivot_wider(
names_from = c(metric, unit),
names_glue = "{metric}_{unit}",
values_from = value
)
```r
```r
## # A tibble: 2 × 3
## subject weight_kg height_cm
## <chr> <dbl> <dbl>
## 1 Alice 62 170
## 2 Bob 78 175
Handling Missing Values
When pivoting wider, rows that do not have a matching value create NA entries in the new columns. You can fill those with values_fill.
incomplete <- tibble(
subject = c("Alice", "Bob", "Carol"),
timepoint = c("week_4", "week_4", "week_8"),
result = c(92, NA, 88)
)
incomplete |>
pivot_wider(
names_from = timepoint,
values_from = result,
values_fill = 0
)
```r
```r
## # A tibble: 3 × 3
## subject week_4 week_8
## <chr> <dbl> <dbl>
## 1 Alice 92 0
## 2 Bob 0 0
## 3 Carol 0 88
Bob’s missing week_8 value got filled with 0. You can also pass a named list to values_fill for per-column control.
Survey Data: Separating Multiple Values per Row
A common real-world scenario is survey data where each row has multiple responses labeled by question and metric. You might have columns like q1_stress, q1_satisfaction, q2_stress, q2_satisfaction. The question number and the metric are both embedded in the column name.
survey <- tibble(
respondent_id = c(1, 2, 3),
q1_stress = c(3, 1, 4),
q1_satisfaction = c(4, 2, 5),
q2_stress = c(2, 3, 3),
q2_satisfaction = c(3, 4, 4)
)
survey |>
pivot_longer(
cols = starts_with("q"),
names_to = c("question", ".value"),
names_sep = "_",
values_to = c("stress", "satisfaction")
)
```r
The `".value"` in `names_to` tells tidyr that the second piece of the column name (`stress`, `satisfaction`) defines separate columns. The first piece (`q1`, `q2`) goes into the `question` column. The result has one row per respondent per question, with separate `stress` and `satisfaction` columns.
This is a clean way to separate concerns that were jumbled together in the original column names.
## Common Mistakes
**Forgetting id_cols in pivot_wider.** If your long data has duplicate identifier combinations, `pivot_wider` silently keeps only the last value. Always check that your identifier columns uniquely identify each row before widening.
**Trying to pivot inside ggplot2 aesthetics.** You cannot reshape data inside an `aes()` call. Pre-pivot your data first, then pass the reshaped result to `ggplot()`.
**Not using names_sep for compound column names.** If your column names contain multiple pieces of information but you only provide a single `names_to` value, tidyr stores the entire original name as one value. That is usually not what you want.
**values_drop_na applies only to pivot_longer.** This flag removes rows that would have been created from `NA` values. It has no effect in `pivot_wider`, where you use `values_fill` instead.
## Why This Matters for ggplot2
ggplot2 works best when each aesthetic mapping corresponds to a column in your data. If you want lines colored by group, the group identifier must be a column, not buried in a column name. If you want facets, the faceting variable must be a regular column. Long format gives you all of this for free.
With long data, a single `geom_line()` call handles multiple series automatically:
```r
# Long data: one aesthetic mapping handles grouping
ggplot(long, aes(x = timepoint, y = result, color = subject)) +
geom_line()
```r
With wide data, you end up repeating geometries or manually specifying group aesthetics for each series. It works, but it is verbose and fragile.
## Conclusion
tidyr's pivot functions solve the most common data reshaping problem: converting between the format your data arrives in and the format your analysis tool expects. `pivot_longer()` gathers columns into rows, creating long data. `pivot_wider()` spreads rows into columns, creating wide data. Both handle the messy details like missing values, compound column names, and multiple value columns.
The skill is knowing which direction to pivot. Most plotting and modeling functions want long data. Most human-readable summaries prefer wide data. Once you internalize that distinction, pivoting becomes second nature.
## See Also
- [Data Frames and Tibbles](/tutorials/data-frames-and-tibbles/) — tibbles, rownames, and column operations that pair well with pivoting
- [ggplot2 Facets and Themes](/tutorials/ggplot2-facets-and-themes/) — facet_wrap and facet_grid require long data, making pivoting a prerequisite for many visualizations
- [Reshaping Data with tidyr](/tutorials/tidyr-reshaping/) — complete reference covering unite, separate, and the older gather/spread functions
## Written
- File: sites/rguides/src/content/tutorials/tidyr-pivoting.md
- Words: ~950
- Read time: 5 min
- Topics covered: wide vs long format, pivot_longer, pivot_wider, names_sep, names_glue, .value, values_fill, values_drop_na, ggplot2 integration
- Verified via: tidyr.tidyverse.org pivot_longer and pivot_wider reference pages
- Unverified items: none