dplyr::summarise()

summarise(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)
Returns: tibble · Added in v1.0.0 · Updated April 1, 2026 · Tidyverse
dplyr summarise tidyverse data-wrangling group-by

summarise() (also spelled summarize()) collapses a tibble or data frame into a single row per group by computing summary statistics. It is one of the most-used functions in the tidyverse for aggregating data. dplyr ships both spellings as a convenience for UK and US conventions — they are identical.

Syntax

summarise(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)
summarize(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)  # identical

Parameters

ParameterTypeDefaultDescription
.datatibble / data.framerequiredInput data frame or tibble
...name-value expressionsrequiredNew columns defined as name = expression
.bybare column namesNULLGroup by these columns before summarising (dplyr 1.1+). Passes bare unquoted names, not a vector.
.sortlogicalFALSESort output rows by size of .by groups, largest first (dplyr 1.1+)
.na.rmlogicalFALSEPass na.rm = TRUE to all summary functions that accept it

The .by parameter is the modern alternative to group_by() for one-off grouped summaries. It accepts bare column names directly: .by = c(cyl, am) instead of group_by(cyl, am) |> summarise(...).

Summary functions

Any function that takes a vector and returns a single value works inside summarise(). Common choices:

FunctionWhat it returns
mean(x)Arithmetic mean
median(x)Median value
sum(x)Sum of values
sd(x)Standard deviation
min(x)Minimum value
max(x)Maximum value
n()Count of rows (no arguments)
n_distinct(x)Count of unique values
first(x)First value in group
last(x)Last value in group
nth(x, n)Nth value in group
IQR(x)Interquartile range
quantile(x, probs)Quantile values

All of these ignore NA by default. Use .na.rm = TRUE to strip missing values before computing.

Examples

Basic summarise

library(dplyr)

# Overall summary of mtcars
mtcars |> summarise(avg_mpg = mean(mpg), total_hp = sum(hp))
#> # A tibble: 1 × 2
#>   avg_mpg total_hp
#>     <dbl>    <dbl>
#> 1     20.1     4694

Grouped summarise with .by (dplyr 1.1+)

The .by argument groups data before summarising, so you skip a separate group_by() call.

# Average mpg and total hp per cylinder/gear combination
mtcars |> summarise(
  avg_mpg = mean(mpg),
  total_hp = sum(hp),
  .by = c(cyl, gear)
)
#> # A tibble: 9 × 4
#>     cyl  gear avg_mpg total_hp
#>   <dbl> <dbl>    <dbl>    <dbl>
#> 1     4     3     21.5      415
#> 2     4     4     26.9      468
#> ...

Sorting with .sort

# Sort output by group size, largest group first
mtcars |> summarise(n = n(), .by = cyl, .sort = TRUE)
#> # A tibble: 3 × 2
#>     cyl     n
#>   <dbl> <int>
#> 1     8    14
#> 2     4    11
#> 3     6     7

Multiple summary statistics

# Compute several statistics at once
mtcars |> summarise(
  mean_mpg = mean(mpg),
  median_mpg = median(mpg),
  sd_mpg = sd(mpg),
  min_mpg = min(mpg),
  max_mpg = max(mpg),
  .by = cyl
)
#> # A tibble: 3 × 6
#>     cyl mean_mpg median_mpg  sd_mpg min_mpg max_mpg
#>   <dbl>    <dbl>      <dbl>   <dbl>   <dbl>   <dbl>
#> 1     4     26.7       26      4.51    21.4    33.9
#> ...

across() for multiple columns

Apply the same summary to all numeric columns with across().

# Mean of all Sepal columns in iris, by species
iris |>
  summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"), .by = Species)
#> # A tibble: 3 × 3
#>   Species    mean_Sepal.Length mean_Sepal.Width
#>   <fct>                <dbl>            <dbl>
#> 1 setosa                5.01            3.43
#> ...

Custom column naming with .names

across() supports glue syntax in .names to construct output column names dynamically.

iris |>
  summarise(
    across(
      c(Sepal.Length, Sepal.Width),
      list(mean = mean, sd = sd),
      .names = "{col}_{fn}"
    ),
    .by = Species
  )

Handling NA values

summarise() propagates NA by default. Set .na.rm = TRUE globally to remove them before computing.

df <- tibble(x = c(1, 2, NA, 4), g = c("a", "a", "b", "b"))

# Default: NA propagates through
df |> summarise(mean_x = mean(x), .by = g)
#> # A tibble: 2 × 2
#>   g     mean_x
#>   <chr>   <dbl>
#> 1 a         NA
#> 2 b         NA

# Set .na.rm globally
df |> summarise(mean_x = mean(x), .by = g, .na.rm = TRUE)
#> # A tibble: 2 × 2
#>   g     mean_x
#>   <chr>   <dbl>
#> 1 a        1.5
#> 2 b        4

You can also pass na.rm = TRUE directly to individual functions like mean(x, na.rm = TRUE).

The .groups argument (deprecated)

In dplyr 1.0+, .groups controlled the grouping structure of the output. This was deprecated in dplyr 1.1+:

# .groups is now deprecated
mtcars |>
  group_by(cyl) |>
  summarise(n = n(), .groups = "drop_last")

With .by, grouping is determined automatically. Use ungroup() if you need explicit control:

mtcars |>
  summarise(n = n(), .by = cyl) |>
  ungroup()

count() and tally() shortcuts

For simple row counts, dplyr provides two helpers that save a few keystrokes:

# count() is a specialised summarise for counting rows
mtcars |> count(cyl, am)
mtcars |> count(cyl, wt = hp)   # weighted count

# tally() is the pipe-friendly alias
mtcars |> group_by(cyl) |> tally()

Both are equivalent to summarise(n = n(), .by = ...).

Common Gotchas

Unquoted column names. dplyr uses tidy evaluation, so column names are unquoted. To pass a column name programmatically from a function argument, use the embrace operator {{ }}:

my_summary <- function(data, col) {
  data |> summarise(result = mean({{ col }}))
}

Empty ... returns one row. Calling summarise() with no summary expressions returns a single row with all existing columns set to NA:

mtcars |> summarise()
#> # A tibble: 1 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <NA> <NA>  <NA> <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>

Grouped summarise changes row count. When used on grouped data, summarise collapses each group to a single row. The output has as many rows as there are groups.

NA propagation. Summary functions return NA when input contains missing values. Pass na.rm = TRUE to individual functions or set .na.rm = TRUE globally.

.by takes bare names, not a vector. Use .by = c(col1, col2) with the pipe syntax. Unlike group_by(), you do not need to quote or wrap the column names.

See Also