dplyr::distinct

distinct(.data, ..., .keep_all = FALSE)
Returns: tibble · Updated April 3, 2026 · Tidyverse
r dplyr tidyverse data-manipulation

distinct() removes duplicate rows from a data frame, keeping only unique combinations of the columns you specify. It is considerably faster than base R’s unique.data.frame(), and it always returns a tibble.

Basic Usage

By default, distinct() checks all columns and returns only rows that are fully unique:

library(dplyr)

df <- tibble(
  x = c(1, 1, 2, 1),
  y = c("a", "a", "b", "a")
)

distinct(df)
# # A tibble: 3 × 2
#       x y
# 1     1 a
# 2     2 b
# 3     1 a

When two rows share the same values in every column, distinct() keeps only the first one and discards the rest.

Selecting Specific Columns

Pass column names to distinct() to check uniqueness on a subset of columns:

distinct(df, x)
# # A tibble: 2 × 1
#       x
# 1     1
# 2     2

This drops all other columns. If you want to keep the rest of your data, use .keep_all = TRUE.

Keeping All Columns with .keep_all

The .keep_all argument controls whether unmentioned columns are retained:

distinct(df, x, .keep_all = TRUE)
# # A tibble: 2 × 2
#       x y
# 1     1 a
# 2     2 b

Without .keep_all = TRUE, specifying x alone would drop y. With .keep_all = TRUE, you get the first row of each unique x value while keeping all columns. When multiple rows share the same x, only the first occurrence by row order is retained.

Computed Columns

You can create expressions on the fly to determine uniqueness:

distinct(df, diff = abs(x - 1))
# # A tibble: 2 × 1
#   <dbl>
# 1     0
# 2     1

This checks uniqueness based on abs(x - 1) but does not add that column to the output (unless you also use .keep_all = TRUE).

Using across() for Multiple Columns

When you need to check uniqueness across many columns, across() lets you use select-helper semantics:

df2 <- tibble(
  name = c("Alice", "Alice", "Bob", "Bob"),
  age  = c(25, 25, 30, 30),
  city = c("NY", "NY", "LA", "LA")
)

distinct(df2, across(contains("city")))
# # A tibble: 2 × 1
#   city
# 1 NY
# 2 LA

This is the modern replacement for the deprecated distinct_all(), distinct_at(), and distinct_if() scoped variants.

Grouped Data Frames

When your data is grouped with group_by(), the grouping variables are always included in the uniqueness check:

df_g <- tibble(
  g = c(1, 1, 2, 2),
  x = c(1, 1, 2, 1)
) %>% group_by(g)

distinct(df_g, x)
# # A tibble: 3 × 2
# # Groups: g [2]
#       g     x
#   <dbl> <dbl>
# 1     1     1
# 2     2     2
# 3     2     1

Notice that g appears in the output even though you did not explicitly mention it — grouping columns are always retained.

Common Pitfalls

Dropped columns by default. If you write distinct(df, col_a) expecting a subset of df with only duplicate-free rows, you lose every other column. Use .keep_all = TRUE if you need them.

NA is a distinct value. A row containing NA in your specified columns is not automatically removed — NA counts as its own value when checking uniqueness.

Column order affects which row is kept. distinct(df, x, y) keeps the first row for each unique (x, y) combination. If you swap the order to distinct(df, y, x), you may get a different representative row, since “first” is determined by the order of columns as they appear in the call.

See Also