How to Group Data and Summarise with dplyr in R

March 14, 2026 · 2 min read ·Updated May 28, 2026 ·beginner

rgroup-bysummarisedplyr

To group data in R for summary statistics, dplyr::group_by() paired with summarise() answers questions like “what is the average salary per department?” or “how many sales happened each month?”. You split the data into groups, compute a statistic for each group, and combine the results into a tidy summary table — all inside a single pipeline.

library(dplyr)

# Group by department and compute summary statistics
df %>%
  group_by(department) %>%
  summarise(
    mean_salary = mean(salary, na.rm = TRUE),
    median_salary = median(salary, na.rm = TRUE),
    n = n(),
    .groups = "drop"
  )

# Simple count per group
df %>% count(department)

dplyr::summarise() is the most readable approach and returns a tidy data frame. List all statistics in one summarise() call rather than calling aggregate() multiple times. For large datasets (millions of rows), data.table’s dt[, .(mean_salary = mean(salary)), by = department] syntax is substantially faster.

# Base R equivalent
aggregate(salary ~ department, data = df, FUN = mean)

# data.table approach
library(data.table)
dt <- as.data.table(df)
dt[, .(mean_salary = mean(salary)), by = department]

Common summary functions include mean(), median(), sum(), min(), max(), sd(), and the row-count helpers n() (dplyr) and .N (data.table). Use na.rm = TRUE to handle missing values in any summary function. When you need multiple statistics per group, list them all inside a single summarise() call rather than chaining separate aggregate() commands — the dplyr approach runs faster and produces a cleaner output data frame.

See also