dplyr::group_by
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data)) grouped_df · Updated April 1, 2026 · Tidyverse group_by() takes a data frame and adds grouping metadata that tells downstream dplyr verbs how to split operations across subsets of rows. The result is a grouped data frame (grouped_df) that behaves identically to a tibble in most respects, except that functions like summarise(), mutate(), and filter() operate within each group rather than across the entire dataset.
Syntax
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))
The companion function ungroup() removes grouping metadata:
ungroup(x, ...)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
.data | tibble / data.frame | required | Input data frame (or tibble/lazy data frame from dbplyr). |
... | unquoted columns | required | Column names to group by. Uses data-masking — column names are used directly without quoting. |
.add | logical | FALSE | If FALSE (default), replaces any existing grouping. If TRUE, appends to existing groups. |
.drop | logical | varies | If TRUE, drops groups for factor levels absent from the data. If FALSE, preserves empty factor levels as explicit groups. Default is TRUE except when .data was previously grouped with .drop = FALSE. |
x | grouped_df | required | A grouped data frame to ungroup. |
... (in ungroup) | unquoted columns | omitted | Specific variables to remove from grouping. If omitted, removes all grouping. |
Grouping behavior with .add
The .add argument controls whether a new group_by() call replaces or extends existing groups.
With the default .add = FALSE, any prior grouping is discarded and replaced entirely:
library(dplyr)
# Start with one grouping level
by_cyl <- mtcars %>% group_by(cyl)
# This REPLACES the grouping, not adds to it
by_cyl %>% group_by(gear) %>% groups()
#> [[1]]
#> gear
To add a grouping layer without removing the original, set .add = TRUE:
by_cyl <- mtcars %>% group_by(cyl)
by_cyl_gear <- by_cyl %>% group_by(gear, .add = TRUE)
by_cyl_gear %>% summarise(n = n())
#> # A tibble: 8 × 3
#> # Groups: cyl, gear
#> cyl gear n
#> <dbl> <dbl> <int>
#> 1 4 4 1
#> 2 4 5 2
#> 3 6 3 2
#> 4 6 4 4
#> 5 8 3 5
#> 6 8 5 1
This matters when building up groupings incrementally in a pipeline.
The .drop argument and factor levels
When grouping by a factor column, .drop controls whether unused factor levels produce empty groups:
df <- tibble(
x = factor(c("a", "b"), levels = c("a", "b", "c")),
y = c(1, 2)
)
# Default (.drop = TRUE): level "c" is absent, so no group is created
df %>% group_by(x) %>% summarise(n = n())
#> # A tibble: 2 × 2
#> # Groups: x
#> x n
#> <fct> <int>
#> 1 a 1
#> 2 b 1
# With .drop = FALSE: level "c" appears as an empty group
df %>% group_by(x, .drop = FALSE) %>% summarise(n = n())
#> # A tibble: 3 × 2
#> # Groups: x
#> x n
#> <fct> <int>
#> 1 a 1
#> 2 b 1
#> 3 c 0
Setting .drop = FALSE is useful when you want to preserve the full factor structure, such as when generating summary tables where all levels must appear regardless of presence in data.
How summarise() consumes grouping
Each call to summarise() removes one layer of grouping from the innermost (most recently added) grouping variable. For nested groupings, you need multiple summarise() calls to fully ungroup:
mtcars %>%
group_by(cyl, gear) %>%
summarise(n = n())
#> # A tibble: 8 × 3
#> # Groups: cyl
#> cyl gear n
#> <dbl> <dbl> <int>
#> 1 4 4 1
#> ...
One summarise() removed the gear grouping. Call it again to get an ungrouped result, or use .groups = "drop" in summarise() to drop all grouping at once:
mtcars %>%
group_by(cyl, gear) %>%
summarise(n = n(), .groups = "drop")
#> # A tibble: 8 × 3
#> # Groups: None
#> cyl gear n
#> <dbl> <dbl> <int>
#> 1 4 4 1
#> ...
Removing grouping with ungroup()
ungroup() removes grouping metadata entirely. Call it with no arguments to strip all grouping:
mtcars %>%
group_by(cyl, gear) %>%
ungroup() %>%
nrow()
#> [1] 32
Or specify column names to remove only certain grouping variables while preserving others. Pass column names directly to ungroup():
mtcars %>%
group_by(cyl, gear, am) %>%
ungroup(gear) %>%
groups()
#> [[1]]
#> cyl
#> [[2]]
#> am
Common gotchas
Calling group_by() twice does not accumulate. The second call replaces the first unless you explicitly pass .add = TRUE. This trips up many people who assume the calls stack:
# Wrong: gear grouping replaces cyl, not adds to it
mtcars %>% group_by(cyl) %>% group_by(gear) %>% groups()
#> [[1]]
#> gear
# Correct: add a second layer
mtcars %>% group_by(cyl) %>% group_by(gear, .add = TRUE) %>% groups()
#> [[1]]
#> cyl
#> [[2]]
#> gear
Computations inside group_by() see the ungrouped data. If you write group_by(df, col * 2), the multiplication happens on the full data frame before grouping. To operate on per-group values, add a mutate() step first.
See Also
- dplyr::summarise() — collapses each group to a single row
- dplyr::mutate() — computes per-group transformations
- dplyr::arrange() — orders data, optionally within groups