How to Find Duplicate Rows in R and Remove Them
You can find duplicate rows in R using duplicated(), a base R function that identifies repeated rows in a data frame. The function marks the second and later occurrences as TRUE, making it easy to spot or filter out copies during data cleaning. Combined with dplyr::distinct(), you get a complete toolkit to find duplicate rows and remove them in one pass.
df <- data.frame(
id = c(1, 2, 2, 3, 4, 1),
name = c("A", "B", "B", "C", "D", "A"),
score = c(85, 90, 90, 78, 88, 85)
)
# Which rows are duplicates?
duplicated(df)
# [1] FALSE FALSE TRUE FALSE FALSE TRUE
# Extract duplicate rows
df[duplicated(df), ]
# id name score
# 3 2 B 90
# 6 1 A 85
# Remove all duplicates, keep first occurrence
df[!duplicated(df), ]
# id name score
# 1 1 A 85
# 2 2 B 90
# 4 3 C 78
# 5 4 D 88
duplicated() marks the second and later occurrences as TRUE. To flag all occurrences including the first, use duplicated(df) | duplicated(df, fromLast = TRUE). For counting duplicates per group, dplyr::group_by() %>% filter(n() > 1) works well in pipelines.
library(dplyr)
# Find rows that repeat in specific columns
df %>% group_by(id, name) %>% filter(n() > 1) %>% ungroup()
# Remove duplicates with dplyr
distinct(df, id, .keep_all = TRUE)
For large datasets, data.table::unique(dt, by = "id") is faster than the base R equivalent. Use anyDuplicated(df) to check if any duplicates exist without scanning the whole dataset — it stops at the first match. Keeping duplicates or removing them depends on context: survey data often needs deduplication by respondent ID, while log data may treat repeated entries as distinct events worth counting rather than cleaning.
See also
- duplicated(), Find duplicate elements in a vector or rows in a data frame
- unique(), Extract unique elements or rows
- table(), Count frequency of values