dplyr::summarise()

summarise(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)

Returns tibble· Added in v1.0.0· Updated April 1, 2026· Tidyverse

dplyrsummarisetidyversedata-wranglinggroup-by

summarise() (also spelled summarize()) collapses a tibble or data frame into a single row per group by computing summary statistics. It is one of the most-used functions in the tidyverse for aggregating data. dplyr ships both spellings as a convenience for UK and US conventions — they are identical.

Syntax

summarise(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)
summarize(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)  # identical

Parameters

Parameter	Type	Default	Description
`.data`	tibble / data.frame	required	Input data frame or tibble
`...`	name-value expressions	required	New columns defined as `name = expression`
`.by`	bare column names	`NULL`	Group by these columns before summarising (dplyr 1.1+). Passes bare unquoted names, not a vector.
`.sort`	logical	`FALSE`	Sort output rows by size of `.by` groups, largest first (dplyr 1.1+)
`.na.rm`	logical	`FALSE`	Pass `na.rm = TRUE` to all summary functions that accept it

The .by parameter is the modern alternative to group_by() for one-off grouped summaries. It accepts bare column names directly: .by = c(cyl, am) instead of group_by(cyl, am) |> summarise(...).

Summary functions

Any function that takes a vector and returns a single value works inside summarise(). Common choices:

Function	What it returns
`mean(x)`	Arithmetic mean
`median(x)`	Median value
`sum(x)`	Sum of values
`sd(x)`	Standard deviation
`min(x)`	Minimum value
`max(x)`	Maximum value
`n()`	Count of rows (no arguments)
`n_distinct(x)`	Count of unique values
`first(x)`	First value in group
`last(x)`	Last value in group
`nth(x, n)`	Nth value in group
`IQR(x)`	Interquartile range
`quantile(x, probs)`	Quantile values

All of these ignore NA by default. Use .na.rm = TRUE to strip missing values before computing.

Examples

Basic summarise

library(dplyr)

# Overall summary of mtcars
mtcars |> summarise(avg_mpg = mean(mpg), total_hp = sum(hp))
#> # A tibble: 1 × 2
#>   avg_mpg total_hp
#>     <dbl>    <dbl>
#> 1     20.1     4694

Grouped summarise with .by (dplyr 1.1+)

The .by argument groups data before summarising, so you skip a separate group_by() call.

# Average mpg and total hp per cylinder/gear combination
mtcars |> summarise(
  avg_mpg = mean(mpg),
  total_hp = sum(hp),
  .by = c(cyl, gear)
)
#> # A tibble: 9 × 4
#>     cyl  gear avg_mpg total_hp
#>   <dbl> <dbl>    <dbl>    <dbl>
#> 1     4     3     21.5      415
#> 2     4     4     26.9      468
#> ...

Sorting with .sort

# Sort output by group size, largest group first
mtcars |> summarise(n = n(), .by = cyl, .sort = TRUE)
#> # A tibble: 3 × 2
#>     cyl     n
#>   <dbl> <int>
#> 1     8    14
#> 2     4    11
#> 3     6     7

Multiple summary statistics

# Compute several statistics at once
mtcars |> summarise(
  mean_mpg = mean(mpg),
  median_mpg = median(mpg),
  sd_mpg = sd(mpg),
  min_mpg = min(mpg),
  max_mpg = max(mpg),
  .by = cyl
)
#> # A tibble: 3 × 6
#>     cyl mean_mpg median_mpg  sd_mpg min_mpg max_mpg
#>   <dbl>    <dbl>      <dbl>   <dbl>   <dbl>   <dbl>
#> 1     4     26.7       26      4.51    21.4    33.9
#> ...

across() for multiple columns

Apply the same summary to all numeric columns with across().

# Mean of all Sepal columns in iris, by species
iris |>
  summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"), .by = Species)
#> # A tibble: 3 × 3
#>   Species    mean_Sepal.Length mean_Sepal.Width
#>   <fct>                <dbl>            <dbl>
#> 1 setosa                5.01            3.43
#> ...

Custom column naming with .names

across() supports glue syntax in .names to construct output column names dynamically.

iris |>
  summarise(
    across(
      c(Sepal.Length, Sepal.Width),
      list(mean = mean, sd = sd),
      .names = "{col}_{fn}"
    ),
    .by = Species
  )

Handling NA values

summarise() propagates NA by default. Set .na.rm = TRUE globally to remove them before computing.

df <- tibble(x = c(1, 2, NA, 4), g = c("a", "a", "b", "b"))

# Default: NA propagates through
df |> summarise(mean_x = mean(x), .by = g)
#> # A tibble: 2 × 2
#>   g     mean_x
#>   <chr>   <dbl>
#> 1 a         NA
#> 2 b         NA

# Set .na.rm globally
df |> summarise(mean_x = mean(x), .by = g, .na.rm = TRUE)
#> # A tibble: 2 × 2
#>   g     mean_x
#>   <chr>   <dbl>
#> 1 a        1.5
#> 2 b        4

You can also pass na.rm = TRUE directly to individual functions like mean(x, na.rm = TRUE).

The .groups argument (deprecated)

In dplyr 1.0+, .groups controlled the grouping structure of the output. This was deprecated in dplyr 1.1+:

# .groups is now deprecated
mtcars |>
  group_by(cyl) |>
  summarise(n = n(), .groups = "drop_last")

With .by, grouping is determined automatically. Use ungroup() if you need explicit control:

mtcars |>
  summarise(n = n(), .by = cyl) |>
  ungroup()

count() and tally() shortcuts

For simple row counts, dplyr provides two helpers that save a few keystrokes:

# count() is a specialised summarise for counting rows
mtcars |> count(cyl, am)
mtcars |> count(cyl, wt = hp)   # weighted count

# tally() is the pipe-friendly alias
mtcars |> group_by(cyl) |> tally()

Both are equivalent to summarise(n = n(), .by = ...).

Common Gotchas

Unquoted column names. dplyr uses tidy evaluation, so column names are unquoted. To pass a column name programmatically from a function argument, use the embrace operator {{ }}:

my_summary <- function(data, col) {
  data |> summarise(result = mean({{ col }}))
}

Empty ... returns one row. Calling summarise() with no summary expressions returns a single row with all existing columns set to NA:

mtcars |> summarise()
#> # A tibble: 1 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <NA> <NA>  <NA> <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>

Grouped summarise changes row count. When used on grouped data, summarise collapses each group to a single row. The output has as many rows as there are groups.

NA propagation. Summary functions return NA when input contains missing values. Pass na.rm = TRUE to individual functions or set .na.rm = TRUE globally.

.by takes bare names, not a vector. Use .by = c(col1, col2) with the pipe syntax. Unlike group_by(), you do not need to quote or wrap the column names.