Reshaping Data with tidyr

March 7, 2026 · 4 min read · Updated March 7, 2026 · beginner

tidyr pivot reshaping tidyverse data-wrangling

One of the most common data wrangling challenges is switching between wide and long data formats. Wide data has multiple columns representing different time points or categories, while long data has each observation in its own row. The tidyr package provides two powerful functions for this: pivot_longer() and pivot_wider().

Why Reshape Your Data?

Different data formats serve different purposes. Wide format is great for human reading and reporting:

# Wide format - easy to read
wide_data <- data.frame(
  id = c(1, 2, 3),
  name = c("Alice", "Bob", "Carol"),
  score_2023 = c(85, 92, 78),
  score_2024 = c(88, 95, 82),
  score_2025 = c(91, 98, 86)
)
print(wide_data)

Long format is required for most statistical modeling and visualization in R:

# Long format - each row is one observation
long_data <- data.frame(
  id = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Bob", "Carol", "Carol", "Carol"),
  year = c(2023, 2024, 2025, 2023, 2024, 2025, 2023, 2024, 2025),
  score = c(85, 88, 91, 92, 95, 98, 78, 82, 86)
)
print(long_data)

Most ggplot2 visualizations and dplyr operations work best with long data.

Converting Wide to Long with pivot_longer()

The pivot_longer() function transforms wide data into long format. You specify which columns to gather and what to name the new columns.

Basic Usage

library(tidyr)
library(dplyr)

# Starting with wide data
wide <- data.frame(
  id = 1:3,
  name = c("Alice", "Bob", "Carol"),
  score_2023 = c(85, 92, 78),
  score_2024 = c(88, 95, 82),
  score_2025 = c(91, 98, 86)
)

# Convert to long format
long <- wide %>%
  pivot_longer(
    cols = starts_with("score_"),
    names_to = "year",
    values_to = "score"
  ) %>%
  mutate(year = parse_number(year))

print(long)

The key arguments:

cols: Which columns to gather (use starts_with(), contains(), or column names)
names_to: Name for the new column that holds the old column names
values_to: Name for the new column that holds the values

Advanced pivot_longer()

You can use regular expressions and more complex patterns:

# Example with multiple value columns
wide2 <- data.frame(
  id = 1:3,
  product = c("Widget A", "Widget B", "Widget C"),
  price_q1 = c(100, 150, 200),
  price_q2 = c(105, 155, 210),
  units_q1 = c(50, 30, 20),
  units_q2 = c(55, 35, 22)
)

long2 <- wide2 %>%
  pivot_longer(
    cols = matches("_(q1|q2)$"),
    names_to = c(".value", "quarter"),
    names_sep = "_"
  )

print(long2)

The .value special token tells tidyr to split the column names and use the first part as the column names for the values.

Converting Long to Wide with pivot_wider()

The pivot_wider() function does the opposite: it takes long data and spreads it into wide format.

Basic Usage

# Starting with long data
long <- data.frame(
  id = c(1, 1, 1, 2, 2, 2),
  name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Bob"),
  year = c(2023, 2024, 2025, 2023, 2024, 2025),
  score = c(85, 88, 91, 92, 95, 98)
)

# Convert to wide format
wide <- long %>%
  pivot_wider(
    names_from = year,
    values_from = score,
    names_prefix = "score_"
  )

print(wide)

Key arguments:

names_from: Column containing the new column names
values_from: Column containing the values
names_prefix: Optional prefix for new column names

Handling Missing Values

When pivoting wider, you might encounter missing values. Use values_fill to specify a default:

long_with_na <- data.frame(
  id = c(1, 1, 2, 2),
  measure = c("height", "weight", "height", "weight"),
  value = c(175, 70, 180, NA)
)

wide_filled <- long_with_na %>%
  pivot_wider(
    names_from = measure,
    values_from = value,
    values_fill = 0  # Fill missing with 0
  )

print(wide_filled)

Practical Example: Time Series Analysis

Let’s walk through a real example. Imagine you have sales data in wide format and need to analyze trends:

# Wide sales data
sales_wide <- data.frame(
  product_id = 1:4,
  product_name = c("Widget", "Gadget", "Gizmo", "Thingamajig"),
  jan = c(100, 150, 200, 50),
  feb = c(110, 160, 190, 55),
  mar = c(120, 170, 180, 60),
  apr = c(130, 180, 170, 65)
)

# Convert to long for analysis
sales_long <- sales_wide %>%
  pivot_longer(
    cols = jan:apr,
    names_to = "month",
    values_to = "sales"
  )

# Analyze monthly trends
monthly_summary <- sales_long %>%
  group_by(month) %>%
  summarise(total_sales = sum(sales), .groups = "drop_last")

print(monthly_summary)

Best Practices

Plan your structure first: Know which format you need before reshaping
Use clear column names: After pivoting, clean up the new column names
Check for duplicates: pivot_wider() will fail if you have duplicate identifier-value pairs
Handle NA values explicitly: Use values_fill or values_drop_na as needed

Summary

Function	Use Case	Key Arguments
`pivot_longer()`	Wide to Long	cols, names_to, values_to
`pivot_wider()`	Long to Wide	names_from, values_from

These functions are essential tools in your data wrangling toolkit. Master them, and you’ll be able to transform any messy dataset into the format you need for analysis.