dplyr::distinct
distinct(.data, ..., .keep_all = FALSE) tibble · Updated April 3, 2026 · Tidyverse distinct() removes duplicate rows from a data frame, keeping only unique combinations of the columns you specify. It is considerably faster than base R’s unique.data.frame(), and it always returns a tibble.
Basic Usage
By default, distinct() checks all columns and returns only rows that are fully unique:
library(dplyr)
df <- tibble(
x = c(1, 1, 2, 1),
y = c("a", "a", "b", "a")
)
distinct(df)
# # A tibble: 3 × 2
# x y
# 1 1 a
# 2 2 b
# 3 1 a
When two rows share the same values in every column, distinct() keeps only the first one and discards the rest.
Selecting Specific Columns
Pass column names to distinct() to check uniqueness on a subset of columns:
distinct(df, x)
# # A tibble: 2 × 1
# x
# 1 1
# 2 2
This drops all other columns. If you want to keep the rest of your data, use .keep_all = TRUE.
Keeping All Columns with .keep_all
The .keep_all argument controls whether unmentioned columns are retained:
distinct(df, x, .keep_all = TRUE)
# # A tibble: 2 × 2
# x y
# 1 1 a
# 2 2 b
Without .keep_all = TRUE, specifying x alone would drop y. With .keep_all = TRUE, you get the first row of each unique x value while keeping all columns. When multiple rows share the same x, only the first occurrence by row order is retained.
Computed Columns
You can create expressions on the fly to determine uniqueness:
distinct(df, diff = abs(x - 1))
# # A tibble: 2 × 1
# <dbl>
# 1 0
# 2 1
This checks uniqueness based on abs(x - 1) but does not add that column to the output (unless you also use .keep_all = TRUE).
Using across() for Multiple Columns
When you need to check uniqueness across many columns, across() lets you use select-helper semantics:
df2 <- tibble(
name = c("Alice", "Alice", "Bob", "Bob"),
age = c(25, 25, 30, 30),
city = c("NY", "NY", "LA", "LA")
)
distinct(df2, across(contains("city")))
# # A tibble: 2 × 1
# city
# 1 NY
# 2 LA
This is the modern replacement for the deprecated distinct_all(), distinct_at(), and distinct_if() scoped variants.
Grouped Data Frames
When your data is grouped with group_by(), the grouping variables are always included in the uniqueness check:
df_g <- tibble(
g = c(1, 1, 2, 2),
x = c(1, 1, 2, 1)
) %>% group_by(g)
distinct(df_g, x)
# # A tibble: 3 × 2
# # Groups: g [2]
# g x
# <dbl> <dbl>
# 1 1 1
# 2 2 2
# 3 2 1
Notice that g appears in the output even though you did not explicitly mention it — grouping columns are always retained.
Common Pitfalls
Dropped columns by default. If you write distinct(df, col_a) expecting a subset of df with only duplicate-free rows, you lose every other column. Use .keep_all = TRUE if you need them.
NA is a distinct value. A row containing NA in your specified columns is not automatically removed — NA counts as its own value when checking uniqueness.
Column order affects which row is kept. distinct(df, x, y) keeps the first row for each unique (x, y) combination. If you swap the order to distinct(df, y, x), you may get a different representative row, since “first” is determined by the order of columns as they appear in the call.
See Also
- dplyr::filter — subset rows by condition
- dplyr::select — choose columns by name
- dplyr::group_by — group data for grouped operations