dplyr::group_by()

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

Returns grouped_df· Updated May 29, 2026· Tidyverse

dplyrgroup-bysummarisetidyversedata-wrangling

group_by() and summarise() are dplyr verbs that work together to split data into groups and compute summary statistics. group_by() takes a data frame and creates a grouped tibble where subsequent operations are performed on each group separately. summarise() (or summarize()) then collapses each group into a single row, computing the requested statistics.

Syntax

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

summarise(.data, ..., .groups = "drop_last")

Parameters

group_by()

Parameter	Type	Default	Description
.data	tibble/data.frame	required	Input data frame
…	unquoted expressions	required	Columns to group by
.add	logical	FALSE	Add to existing groups instead of replacing
.drop	logical	varies	Drop empty groups

summarise()

Parameter	Type	Default	Description
.data	grouped_df/tibble	required	Input (optionally grouped) data frame
…	named expressions	required	Summary computations
.groups	character	”drop_last”	How to handle grouping structure

Examples

Basic grouping and summarising

The simplest pattern combines group_by() with summarise() to produce per-group aggregates. In the example below, the data is split by category and two summary statistics — total sales and row count — are computed for each group independently:

library(dplyr)

# Sample data
df <- data.frame(
  category = c("A", "A", "B", "B", "A", "B"),
  product = c("X", "Y", "X", "Y", "Z", "X"),
  sales = c(100, 150, 200, 50, 75, 180)
)

# Group by category and compute summary
df %>%
  group_by(category) %>%
  summarise(total_sales = sum(sales), n = n())

Multiple groupings

Grouping by more than one column creates a multi-level grouping structure where summarise() produces one row per unique combination of the grouping variables. The .groups = "drop" argument tells summarise() to remove all grouping after computation, returning an ungrouped tibble:

# Group by multiple columns
df %>%
  group_by(category, product) %>%
  summarise(avg_sales = mean(sales), .groups = "drop")

Multiple summary functions

A single summarise() call can compute many statistics at once, with each summary expression running independently within each group. This avoids the need for multiple separate pipelines when you need several different aggregations from the same grouping structure. The example below calculates six different summary measures in one pass:

df %>%
  group_by(category) %>%
  summarise(
    total = sum(sales),
    mean = mean(sales),
    median = median(sales),
    min = min(sales),
    max = max(sales),
    sd = sd(sales)
  )

Common patterns

Counting and proportions

A frequent grouped operation is counting observations and computing their share of the total. The n() function inside summarise() returns the row count for each group, and dividing by the total number of rows gives the proportion each group contributes:

df %>%
  group_by(category) %>%
  summarise(
    n = n(),
    prop = n / nrow(df) * 100
  )

Using across() for multiple columns

When you need to apply the same summary function to many columns, across() combined with tidy-select helpers like where(is.numeric) applies the function to every matching column in one expression. This is cleaner and less error-prone than writing each column name separately:

df %>%
  group_by(category) %>%
  summarise(across(where(is.numeric), sum))

dplyr::group_by() in practice

group_by() adds grouping metadata to a data frame without changing the data. Subsequent dplyr operations (summarise(), mutate(), filter(), slice()) respect the grouping, computing their results within each group independently.

group_by() with summarise() is the core pattern for grouped aggregation: df |> group_by(category) |> summarise(total = sum(value), n = n()) computes totals and counts per category. The result has one row per unique combination of grouping variables.

group_by() with mutate() adds group-level calculations while preserving all rows: df |> group_by(user_id) |> mutate(session_n = row_number(), total_events = n()) numbers events within each user’s session and adds the per-user event count.

Always ungroup() after grouped operations when the grouping is no longer needed. A grouped data frame passed to subsequent operations outside the pipeline will still apply grouping, which can produce unexpected results. summarise() drops the last grouping level automatically after R 4.1, but mutate() and filter() leave grouping intact.

group_by(.add = TRUE) adds to existing groups rather than replacing them. group_keys(df) returns the distinct group key combinations. group_split(df) splits a grouped data frame into a list of per-group data frames.

group_by() does not change the data, it adds metadata that makes subsequent verbs operate per group. After summarise(), one layer of grouping is removed. After mutate() and filter(), the grouping structure is preserved. Always call ungroup() at the end of a pipeline if you do not want downstream operations to be grouped. Forgetting ungroup() is a common source of unexpected results in chained pipelines.