dplyr::group_by()
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data)) Returns:
grouped_df · Updated March 13, 2026 · Tidyverse dplyr group-by summarise tidyverse data-wrangling
group_by() and summarise() are dplyr verbs that work together to split data into groups and compute summary statistics. group_by() takes a data frame and creates a grouped tibble where subsequent operations are performed on each group separately. summarise() (or summarize()) then collapses each group into a single row, computing the requested statistics.
Syntax
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))
summarise(.data, ..., .groups = "drop_last")
Parameters
group_by()
| Parameter | Type | Default | Description |
|---|---|---|---|
| .data | tibble/data.frame | required | Input data frame |
| … | unquoted expressions | required | Columns to group by |
| .add | logical | FALSE | Add to existing groups instead of replacing |
| .drop | logical | varies | Drop empty groups |
summarise()
| Parameter | Type | Default | Description |
|---|---|---|---|
| .data | grouped_df/tibble | required | Input (optionally grouped) data frame |
| … | named expressions | required | Summary computations |
| .groups | character | ”drop_last” | How to handle grouping structure |
Examples
Basic grouping and summarising
library(dplyr)
# Sample data
df <- data.frame(
category = c("A", "A", "B", "B", "A", "B"),
product = c("X", "Y", "X", "Y", "Z", "X"),
sales = c(100, 150, 200, 50, 75, 180)
)
# Group by category and compute summary
df %>%
group_by(category) %>%
summarise(total_sales = sum(sales), n = n())
Multiple groupings
# Group by multiple columns
df %>%
group_by(category, product) %>%
summarise(avg_sales = mean(sales), .groups = "drop")
Multiple summary functions
df %>%
group_by(category) %>%
summarise(
total = sum(sales),
mean = mean(sales),
median = median(sales),
min = min(sales),
max = max(sales),
sd = sd(sales)
)
Common Patterns
Counting and proportions
df %>%
group_by(category) %>%
summarise(
n = n(),
prop = n / nrow(df) * 100
)
Using across() for multiple columns
df %>%
group_by(category) %>%
summarise(across(where(is.numeric), sum))