dplyr::summarise()
summarise(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE) tibble · Added in v1.0.0 · Updated April 1, 2026 · Tidyverse summarise() (also spelled summarize()) collapses a tibble or data frame into a single row per group by computing summary statistics. It is one of the most-used functions in the tidyverse for aggregating data. dplyr ships both spellings as a convenience for UK and US conventions — they are identical.
Syntax
summarise(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)
summarize(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE) # identical
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
.data | tibble / data.frame | required | Input data frame or tibble |
... | name-value expressions | required | New columns defined as name = expression |
.by | bare column names | NULL | Group by these columns before summarising (dplyr 1.1+). Passes bare unquoted names, not a vector. |
.sort | logical | FALSE | Sort output rows by size of .by groups, largest first (dplyr 1.1+) |
.na.rm | logical | FALSE | Pass na.rm = TRUE to all summary functions that accept it |
The .by parameter is the modern alternative to group_by() for one-off grouped summaries. It accepts bare column names directly: .by = c(cyl, am) instead of group_by(cyl, am) |> summarise(...).
Summary functions
Any function that takes a vector and returns a single value works inside summarise(). Common choices:
| Function | What it returns |
|---|---|
mean(x) | Arithmetic mean |
median(x) | Median value |
sum(x) | Sum of values |
sd(x) | Standard deviation |
min(x) | Minimum value |
max(x) | Maximum value |
n() | Count of rows (no arguments) |
n_distinct(x) | Count of unique values |
first(x) | First value in group |
last(x) | Last value in group |
nth(x, n) | Nth value in group |
IQR(x) | Interquartile range |
quantile(x, probs) | Quantile values |
All of these ignore NA by default. Use .na.rm = TRUE to strip missing values before computing.
Examples
Basic summarise
library(dplyr)
# Overall summary of mtcars
mtcars |> summarise(avg_mpg = mean(mpg), total_hp = sum(hp))
#> # A tibble: 1 × 2
#> avg_mpg total_hp
#> <dbl> <dbl>
#> 1 20.1 4694
Grouped summarise with .by (dplyr 1.1+)
The .by argument groups data before summarising, so you skip a separate group_by() call.
# Average mpg and total hp per cylinder/gear combination
mtcars |> summarise(
avg_mpg = mean(mpg),
total_hp = sum(hp),
.by = c(cyl, gear)
)
#> # A tibble: 9 × 4
#> cyl gear avg_mpg total_hp
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 21.5 415
#> 2 4 4 26.9 468
#> ...
Sorting with .sort
# Sort output by group size, largest group first
mtcars |> summarise(n = n(), .by = cyl, .sort = TRUE)
#> # A tibble: 3 × 2
#> cyl n
#> <dbl> <int>
#> 1 8 14
#> 2 4 11
#> 3 6 7
Multiple summary statistics
# Compute several statistics at once
mtcars |> summarise(
mean_mpg = mean(mpg),
median_mpg = median(mpg),
sd_mpg = sd(mpg),
min_mpg = min(mpg),
max_mpg = max(mpg),
.by = cyl
)
#> # A tibble: 3 × 6
#> cyl mean_mpg median_mpg sd_mpg min_mpg max_mpg
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 26.7 26 4.51 21.4 33.9
#> ...
across() for multiple columns
Apply the same summary to all numeric columns with across().
# Mean of all Sepal columns in iris, by species
iris |>
summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"), .by = Species)
#> # A tibble: 3 × 3
#> Species mean_Sepal.Length mean_Sepal.Width
#> <fct> <dbl> <dbl>
#> 1 setosa 5.01 3.43
#> ...
Custom column naming with .names
across() supports glue syntax in .names to construct output column names dynamically.
iris |>
summarise(
across(
c(Sepal.Length, Sepal.Width),
list(mean = mean, sd = sd),
.names = "{col}_{fn}"
),
.by = Species
)
Handling NA values
summarise() propagates NA by default. Set .na.rm = TRUE globally to remove them before computing.
df <- tibble(x = c(1, 2, NA, 4), g = c("a", "a", "b", "b"))
# Default: NA propagates through
df |> summarise(mean_x = mean(x), .by = g)
#> # A tibble: 2 × 2
#> g mean_x
#> <chr> <dbl>
#> 1 a NA
#> 2 b NA
# Set .na.rm globally
df |> summarise(mean_x = mean(x), .by = g, .na.rm = TRUE)
#> # A tibble: 2 × 2
#> g mean_x
#> <chr> <dbl>
#> 1 a 1.5
#> 2 b 4
You can also pass na.rm = TRUE directly to individual functions like mean(x, na.rm = TRUE).
The .groups argument (deprecated)
In dplyr 1.0+, .groups controlled the grouping structure of the output. This was deprecated in dplyr 1.1+:
# .groups is now deprecated
mtcars |>
group_by(cyl) |>
summarise(n = n(), .groups = "drop_last")
With .by, grouping is determined automatically. Use ungroup() if you need explicit control:
mtcars |>
summarise(n = n(), .by = cyl) |>
ungroup()
count() and tally() shortcuts
For simple row counts, dplyr provides two helpers that save a few keystrokes:
# count() is a specialised summarise for counting rows
mtcars |> count(cyl, am)
mtcars |> count(cyl, wt = hp) # weighted count
# tally() is the pipe-friendly alias
mtcars |> group_by(cyl) |> tally()
Both are equivalent to summarise(n = n(), .by = ...).
Common Gotchas
Unquoted column names. dplyr uses tidy evaluation, so column names are unquoted. To pass a column name programmatically from a function argument, use the embrace operator {{ }}:
my_summary <- function(data, col) {
data |> summarise(result = mean({{ col }}))
}
Empty ... returns one row. Calling summarise() with no summary expressions returns a single row with all existing columns set to NA:
mtcars |> summarise()
#> # A tibble: 1 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Grouped summarise changes row count. When used on grouped data, summarise collapses each group to a single row. The output has as many rows as there are groups.
NA propagation. Summary functions return NA when input contains missing values. Pass na.rm = TRUE to individual functions or set .na.rm = TRUE globally.
.by takes bare names, not a vector. Use .by = c(col1, col2) with the pipe syntax. Unlike group_by(), you do not need to quote or wrap the column names.
See Also
- dplyr::group_by() — longer-form grouping, required for multiple chained operations
- dplyr::across() — apply functions across multiple columns in summarise
- dplyr::mutate() — create new columns without collapsing rows
- dplyr::count() — shortcut for counting rows by group