How to extract unique values from a vector in R

· 3 min read · Updated March 14, 2026 · beginner
r unique duplicates dplyr data.table vector

Extracting unique values is a common data manipulation task in R. Here’s how to do it with base R, dplyr, and data.table.

From a Vector

Base R

The unique() function removes duplicate values:

x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
unique(x)
# [1] 1 2 3 4 5

For character vectors:

colors <- c("red", "blue", "green", "red", "yellow", "blue")
unique(colors)
# [1] "red"    "blue"   "green"  "yellow"

Alternative: duplicated()

Use duplicated() to identify (not remove) duplicates, then negate to get unique values:

x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
x[!duplicated(x)]
# [1] 1 2 3 4 5

This is useful when you want more control over which duplicates to keep:

# Keep first occurrence only
x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
x[!duplicated(x)]

# Keep last occurrence instead
x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
x[!duplicated(x, fromLast = TRUE)]
# [1] 1 2 3 4 5

From a Data Frame

Base R

df <- data.frame(
  id = c(1, 2, 2, 3, 4, 1),
  name = c("A", "B", "B", "C", "D", "A")
)

unique(df)
#   id name
# 1  1    A
# 2  2    B
# 3  3    C
# 4  4    D

dplyr

Use distinct() to get unique rows:

library(dplyr)

df <- data.frame(
  id = c(1, 2, 2, 3, 4, 1),
  name = c("A", "B", "B", "C", "D", "A")
)

distinct(df)
#   id name
# 1  1    A
# 2  2    B
# 3  3    C
# 4  4    D

# Get unique values from specific columns only
distinct(df, id)
#   id
# 1  1
# 2  2
# 3  3
# 4  4

distinct(df, name)
#   name
# 1    A
# 2    B
# 3    C
# 4    D

data.table

library(data.table)

dt <- data.table(
  id = c(1, 2, 2, 3, 4, 1),
  name = c("A", "B", "B", "C", "D", "A")
)

unique(dt)
#    id name
# 1:  1    A
# 2:  2    B
# 3:  3    C
# 4:  4    D

# Unique by specific column
unique(dt, by = "id")
#    id
# 1:  1
# 2:  2
# 3:  3
# 4:  4

Counting Unique Values

To count how many unique values exist:

x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)

# Base R
length(unique(x))
# [1] 5

# dplyr
library(dplyr)
x %>% unique() %>% length()
# [1] 5

# Or with n_distinct() (more efficient)
n_distinct(x)
# [1] 5

For data frames:

library(dplyr)

df <- data.frame(
  id = c(1, 2, 2, 3, 4, 1),
  name = c("A", "B", "B", "C", "D", "A")
)

n_distinct(df$id)
# [1] 4

n_distinct(df$name)
# [1] 4

Practical Examples

Get unique values in a pipe

library(dplyr)

df %>%
  filter(category == "active") %>%
  distinct(user_id) %>%
  pull(user_id)

Unique with NA values

x <- c(1, 2, NA, 3, NA, 4)

# unique() preserves NA
unique(x)
# [1]  1  2 NA  3  4

# Exclude NA
unique(x[!is.na(x)])
# [1] 1 2 3 4

# Or with dplyr
library(dplyr)
x %>% na.omit() %>% unique()
# [1] 1 2 3 4

Get unique combinations of multiple columns

library(dplyr)

df <- data.frame(
  year = c(2020, 2020, 2021, 2021, 2022, 2022),
  quarter = c(1, 1, 2, 2, 3, 3),
  value = c(10, 20, 30, 40, 50, 60)
)

distinct(df, year, quarter)
#   year quarter
# 1  2020       1
# 2  2021       2
# 3  2022       3

Performance Comparison

For large datasets, data.table::unique() is typically fastest:

library(data.table)

dt <- data.table(x = sample(1e6, 1e7, replace = TRUE))

system.time(unique(dt))
#    user  system elapsed 
#   0.452   0.088   0.539

The base R unique() is fastest for vectors, while dplyr::distinct() is most readable for data frames in a pipeline.

See Also

  • unique() — Base R function for extracting unique values
  • duplicated() — Find duplicate elements
  • table() — Count frequency of values