How to Find Duplicate Rows in R and Remove Them

March 14, 2026 · 2 min read ·Updated May 28, 2026 ·beginner

rduplicatesdplyrdata.tabledata-frame

You can find duplicate rows in R using duplicated(), a base R function that identifies repeated rows in a data frame. The function marks the second and later occurrences as TRUE, making it easy to spot or filter out copies during data cleaning. Combined with dplyr::distinct(), you get a complete toolkit to find duplicate rows and remove them in one pass.

df <- data.frame(
  id = c(1, 2, 2, 3, 4, 1),
  name = c("A", "B", "B", "C", "D", "A"),
  score = c(85, 90, 90, 78, 88, 85)
)

# Which rows are duplicates?
duplicated(df)
# [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

# Extract duplicate rows
df[duplicated(df), ]
#   id name score
# 3  2    B     90
# 6  1    A     85

# Remove all duplicates, keep first occurrence
df[!duplicated(df), ]
#   id name score
# 1  1    A     85
# 2  2    B     90
# 4  3    C     78
# 5  4    D     88

duplicated() marks the second and later occurrences as TRUE. To flag all occurrences including the first, use duplicated(df) | duplicated(df, fromLast = TRUE). For counting duplicates per group, dplyr::group_by() %>% filter(n() > 1) works well in pipelines.

library(dplyr)

# Find rows that repeat in specific columns
df %>% group_by(id, name) %>% filter(n() > 1) %>% ungroup()

# Remove duplicates with dplyr
distinct(df, id, .keep_all = TRUE)

For large datasets, data.table::unique(dt, by = "id") is faster than the base R equivalent. Use anyDuplicated(df) to check if any duplicates exist without scanning the whole dataset — it stops at the first match. Keeping duplicates or removing them depends on context: survey data often needs deduplication by respondent ID, while log data may treat repeated entries as distinct events worth counting rather than cleaning.

See also