dplyr::group_by

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

Returns: grouped_df · Updated April 1, 2026 · Tidyverse

dplyr group-by summarise tidyverse data-wrangling

group_by() takes a data frame and adds grouping metadata that tells downstream dplyr verbs how to split operations across subsets of rows. The result is a grouped data frame (grouped_df) that behaves identically to a tibble in most respects, except that functions like summarise(), mutate(), and filter() operate within each group rather than across the entire dataset.

Syntax

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

The companion function ungroup() removes grouping metadata:

ungroup(x, ...)

Parameters

Parameter	Type	Default	Description
`.data`	tibble / data.frame	required	Input data frame (or tibble/lazy data frame from dbplyr).
`...`	unquoted columns	required	Column names to group by. Uses data-masking — column names are used directly without quoting.
`.add`	logical	`FALSE`	If `FALSE` (default), replaces any existing grouping. If `TRUE`, appends to existing groups.
`.drop`	logical	varies	If `TRUE`, drops groups for factor levels absent from the data. If `FALSE`, preserves empty factor levels as explicit groups. Default is `TRUE` except when `.data` was previously grouped with `.drop = FALSE`.
`x`	grouped_df	required	A grouped data frame to ungroup.
`...` (in `ungroup`)	unquoted columns	omitted	Specific variables to remove from grouping. If omitted, removes all grouping.

Grouping behavior with `.add`

The .add argument controls whether a new group_by() call replaces or extends existing groups.

With the default .add = FALSE, any prior grouping is discarded and replaced entirely:

library(dplyr)

# Start with one grouping level
by_cyl <- mtcars %>% group_by(cyl)

# This REPLACES the grouping, not adds to it
by_cyl %>% group_by(gear) %>% groups()
#> [[1]]
#> gear

To add a grouping layer without removing the original, set .add = TRUE:

by_cyl <- mtcars %>% group_by(cyl)
by_cyl_gear <- by_cyl %>% group_by(gear, .add = TRUE)

by_cyl_gear %>% summarise(n = n())
#> # A tibble: 8 × 3
#> # Groups: cyl, gear
#>   cyl  gear     n
#>   <dbl> <dbl> <int>
#> 1     4     4       1
#> 2     4     5       2
#> 3     6     3       2
#> 4     6     4       4
#> 5     8     3       5
#> 6     8     5       1

This matters when building up groupings incrementally in a pipeline.

The `.drop` argument and factor levels

When grouping by a factor column, .drop controls whether unused factor levels produce empty groups:

df <- tibble(
  x = factor(c("a", "b"), levels = c("a", "b", "c")),
  y = c(1, 2)
)

# Default (.drop = TRUE): level "c" is absent, so no group is created
df %>% group_by(x) %>% summarise(n = n())
#> # A tibble: 2 × 2
#> # Groups: x
#>   x         n
#>   <fct> <int>
#> 1 a         1
#> 2 b         1

# With .drop = FALSE: level "c" appears as an empty group
df %>% group_by(x, .drop = FALSE) %>% summarise(n = n())
#> # A tibble: 3 × 2
#> # Groups: x
#>   x         n
#>   <fct> <int>
#> 1 a         1
#> 2 b         1
#> 3 c         0

Setting .drop = FALSE is useful when you want to preserve the full factor structure, such as when generating summary tables where all levels must appear regardless of presence in data.

How `summarise()` consumes grouping

Each call to summarise() removes one layer of grouping from the innermost (most recently added) grouping variable. For nested groupings, you need multiple summarise() calls to fully ungroup:

mtcars %>%
  group_by(cyl, gear) %>%
  summarise(n = n())
#> # A tibble: 8 × 3
#> # Groups: cyl
#>   cyl  gear     n
#>   <dbl> <dbl> <int>
#> 1     4     4       1
#> ...

One summarise() removed the gear grouping. Call it again to get an ungrouped result, or use .groups = "drop" in summarise() to drop all grouping at once:

mtcars %>%
  group_by(cyl, gear) %>%
  summarise(n = n(), .groups = "drop")
#> # A tibble: 8 × 3
#> # Groups: None
#>   cyl  gear     n
#>   <dbl> <dbl> <int>
#> 1     4     4       1
#> ...

Removing grouping with `ungroup()`

ungroup() removes grouping metadata entirely. Call it with no arguments to strip all grouping:

mtcars %>%
  group_by(cyl, gear) %>%
  ungroup() %>%
  nrow()
#> [1] 32

Or specify column names to remove only certain grouping variables while preserving others. Pass column names directly to ungroup():

mtcars %>%
  group_by(cyl, gear, am) %>%
  ungroup(gear) %>%
  groups()
#> [[1]]
#> cyl
#> [[2]]
#> am

Common gotchas

Calling group_by() twice does not accumulate. The second call replaces the first unless you explicitly pass .add = TRUE. This trips up many people who assume the calls stack:

# Wrong: gear grouping replaces cyl, not adds to it
mtcars %>% group_by(cyl) %>% group_by(gear) %>% groups()
#> [[1]]
#> gear

# Correct: add a second layer
mtcars %>% group_by(cyl) %>% group_by(gear, .add = TRUE) %>% groups()
#> [[1]]
#> cyl
#> [[2]]
#> gear

Computations inside group_by() see the ungrouped data. If you write group_by(df, col * 2), the multiplication happens on the full data frame before grouping. To operate on per-group values, add a mutate() step first.

Syntax

Parameters

Grouping behavior with .add

The .drop argument and factor levels

How summarise() consumes grouping

Removing grouping with ungroup()

Common gotchas

See Also

Grouping behavior with `.add`

The `.drop` argument and factor levels

How `summarise()` consumes grouping

Removing grouping with `ungroup()`