dplyr::group_by()

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

Returns: grouped_df · Updated March 13, 2026 · Tidyverse

dplyr group-by summarise tidyverse data-wrangling

group_by() and summarise() are dplyr verbs that work together to split data into groups and compute summary statistics. group_by() takes a data frame and creates a grouped tibble where subsequent operations are performed on each group separately. summarise() (or summarize()) then collapses each group into a single row, computing the requested statistics.

Syntax

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

summarise(.data, ..., .groups = "drop_last")

Parameters

group_by()

Parameter	Type	Default	Description
.data	tibble/data.frame	required	Input data frame
…	unquoted expressions	required	Columns to group by
.add	logical	FALSE	Add to existing groups instead of replacing
.drop	logical	varies	Drop empty groups

summarise()

Parameter	Type	Default	Description
.data	grouped_df/tibble	required	Input (optionally grouped) data frame
…	named expressions	required	Summary computations
.groups	character	”drop_last”	How to handle grouping structure

Examples

Basic grouping and summarising

library(dplyr)

# Sample data
df <- data.frame(
  category = c("A", "A", "B", "B", "A", "B"),
  product = c("X", "Y", "X", "Y", "Z", "X"),
  sales = c(100, 150, 200, 50, 75, 180)
)

# Group by category and compute summary
df %>%
  group_by(category) %>%
  summarise(total_sales = sum(sales), n = n())

Multiple groupings

# Group by multiple columns
df %>%
  group_by(category, product) %>%
  summarise(avg_sales = mean(sales), .groups = "drop")

Multiple summary functions

df %>%
  group_by(category) %>%
  summarise(
    total = sum(sales),
    mean = mean(sales),
    median = median(sales),
    min = min(sales),
    max = max(sales),
    sd = sd(sales)
  )

Common Patterns

Counting and proportions

df %>%
  group_by(category) %>%
  summarise(
    n = n(),
    prop = n / nrow(df) * 100
  )

Using across() for multiple columns

df %>%
  group_by(category) %>%
  summarise(across(where(is.numeric), sum))