How to join two data frames in R
· 2 min read · Updated March 14, 2026 · beginner
r joining dplyr data.table merge
Joining data frames is essential for combining datasets. Here are the main ways to do it.
Sample Data
Create two data frames to join:
employees <- data.frame(
id = c(1, 2, 3, 4),
name = c("Alice", "Bob", "Carol", "David"),
dept_id = c(10, 20, 10, 30)
)
departments <- data.frame(
dept_id = c(10, 20, 30),
department = c("Engineering", "Sales", "Marketing")
)
With dplyr
The tidyverse provides intuitive join functions:
Left Join
Keep all rows from the left table:
library(dplyr)
result <- employees %>%
left_join(departments, by = "dept_id")
# id name dept_id department
# 1 1 Alice 10 Engineering
# 2 2 Bob 20 Sales
# 3 3 Carol 10 Engineering
# 4 4 David 30 Marketing
Inner Join
Keep only rows that match in both tables:
result <- employees %>%
inner_join(departments, by = "dept_id")
Right Join
Keep all rows from the right table:
result <- employees %>%
right_join(departments, by = "dept_id")
Full Join
Keep all rows from both tables:
result <- employees %>%
full_join(departments, by = "dept_id")
Multiple Join Keys
When columns have different names:
result <- employees %>%
left_join(departments, by = c("dept_id" = "dept_id"))
With Base R
The merge() function works similarly:
# Left join equivalent
result <- merge(employees, departments, by = "dept_id", all.x = TRUE)
# Inner join
result <- merge(employees, departments, by = "dept_id")
# Full join
result <- merge(employees, departments, by = "dept_id", all = TRUE)
With data.table
For large datasets, data.table is extremely fast:
library(data.table)
employees_dt <- as.data.table(employees)
departments_dt <- as.data.table(departments)
# Left join
result <- employees_dt[departments_dt, on = "dept_id"]
# Inner join
result <- merge(employees_dt, departments_dt, by = "dept_id")
Join Types Summary
| Join Type | Keeps |
|---|---|
| left_join | All left rows, matching right |
| right_join | All right rows, matching left |
| inner_join | Only matching rows |
| full_join | All rows from both |
Common Issues
Column Name Conflicts
When tables share column names besides the join key:
result <- employees %>%
left_join(departments, by = "dept_id", suffix = c("_emp", "_dept"))
Duplicate Keys
If join keys are not unique, you get a many-to-many join:
# Check for duplicates before joining
table(duplicated(employees$dept_id))
NA Values in Join Keys
NA values never match in joins:
# Filter out NAs first or use join_by() with na_matches = "never"
result <- employees %>%
filter(!is.na(dept_id)) %>%
left_join(departments, by = "dept_id")
See Also
- left_join() — Left join documentation
- filter() — Filter rows
- arrange() — Sort results