How to extract unique values from a vector in R
· 3 min read · Updated March 14, 2026 · beginner
r unique duplicates dplyr data.table vector
Extracting unique values is a common data manipulation task in R. Here’s how to do it with base R, dplyr, and data.table.
From a Vector
Base R
The unique() function removes duplicate values:
x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
unique(x)
# [1] 1 2 3 4 5
For character vectors:
colors <- c("red", "blue", "green", "red", "yellow", "blue")
unique(colors)
# [1] "red" "blue" "green" "yellow"
Alternative: duplicated()
Use duplicated() to identify (not remove) duplicates, then negate to get unique values:
x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
x[!duplicated(x)]
# [1] 1 2 3 4 5
This is useful when you want more control over which duplicates to keep:
# Keep first occurrence only
x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
x[!duplicated(x)]
# Keep last occurrence instead
x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
x[!duplicated(x, fromLast = TRUE)]
# [1] 1 2 3 4 5
From a Data Frame
Base R
df <- data.frame(
id = c(1, 2, 2, 3, 4, 1),
name = c("A", "B", "B", "C", "D", "A")
)
unique(df)
# id name
# 1 1 A
# 2 2 B
# 3 3 C
# 4 4 D
dplyr
Use distinct() to get unique rows:
library(dplyr)
df <- data.frame(
id = c(1, 2, 2, 3, 4, 1),
name = c("A", "B", "B", "C", "D", "A")
)
distinct(df)
# id name
# 1 1 A
# 2 2 B
# 3 3 C
# 4 4 D
# Get unique values from specific columns only
distinct(df, id)
# id
# 1 1
# 2 2
# 3 3
# 4 4
distinct(df, name)
# name
# 1 A
# 2 B
# 3 C
# 4 D
data.table
library(data.table)
dt <- data.table(
id = c(1, 2, 2, 3, 4, 1),
name = c("A", "B", "B", "C", "D", "A")
)
unique(dt)
# id name
# 1: 1 A
# 2: 2 B
# 3: 3 C
# 4: 4 D
# Unique by specific column
unique(dt, by = "id")
# id
# 1: 1
# 2: 2
# 3: 3
# 4: 4
Counting Unique Values
To count how many unique values exist:
x <- c(1, 2, 2, 3, 3, 3, 4, 5, 5)
# Base R
length(unique(x))
# [1] 5
# dplyr
library(dplyr)
x %>% unique() %>% length()
# [1] 5
# Or with n_distinct() (more efficient)
n_distinct(x)
# [1] 5
For data frames:
library(dplyr)
df <- data.frame(
id = c(1, 2, 2, 3, 4, 1),
name = c("A", "B", "B", "C", "D", "A")
)
n_distinct(df$id)
# [1] 4
n_distinct(df$name)
# [1] 4
Practical Examples
Get unique values in a pipe
library(dplyr)
df %>%
filter(category == "active") %>%
distinct(user_id) %>%
pull(user_id)
Unique with NA values
x <- c(1, 2, NA, 3, NA, 4)
# unique() preserves NA
unique(x)
# [1] 1 2 NA 3 4
# Exclude NA
unique(x[!is.na(x)])
# [1] 1 2 3 4
# Or with dplyr
library(dplyr)
x %>% na.omit() %>% unique()
# [1] 1 2 3 4
Get unique combinations of multiple columns
library(dplyr)
df <- data.frame(
year = c(2020, 2020, 2021, 2021, 2022, 2022),
quarter = c(1, 1, 2, 2, 3, 3),
value = c(10, 20, 30, 40, 50, 60)
)
distinct(df, year, quarter)
# year quarter
# 1 2020 1
# 2 2021 2
# 3 2022 3
Performance Comparison
For large datasets, data.table::unique() is typically fastest:
library(data.table)
dt <- data.table(x = sample(1e6, 1e7, replace = TRUE))
system.time(unique(dt))
# user system elapsed
# 0.452 0.088 0.539
The base R unique() is fastest for vectors, while dplyr::distinct() is most readable for data frames in a pipeline.
See Also
- unique() — Base R function for extracting unique values
- duplicated() — Find duplicate elements
- table() — Count frequency of values