How to join two data frames in R

· 2 min read · Updated March 14, 2026 · beginner
r joining dplyr data.table merge

Joining data frames is essential for combining datasets. Here are the main ways to do it.

Sample Data

Create two data frames to join:

employees <- data.frame(
  id = c(1, 2, 3, 4),
  name = c("Alice", "Bob", "Carol", "David"),
  dept_id = c(10, 20, 10, 30)
)

departments <- data.frame(
  dept_id = c(10, 20, 30),
  department = c("Engineering", "Sales", "Marketing")
)

With dplyr

The tidyverse provides intuitive join functions:

Left Join

Keep all rows from the left table:

library(dplyr)

result <- employees %>%
  left_join(departments, by = "dept_id")
#   id  name dept_id  department
# 1  1 Alice      10 Engineering
# 2  2   Bob      20      Sales
# 3  3 Carol      10 Engineering
# 4  4 David      30   Marketing

Inner Join

Keep only rows that match in both tables:

result <- employees %>%
  inner_join(departments, by = "dept_id")

Right Join

Keep all rows from the right table:

result <- employees %>%
  right_join(departments, by = "dept_id")

Full Join

Keep all rows from both tables:

result <- employees %>%
  full_join(departments, by = "dept_id")

Multiple Join Keys

When columns have different names:

result <- employees %>%
  left_join(departments, by = c("dept_id" = "dept_id"))

With Base R

The merge() function works similarly:

# Left join equivalent
result <- merge(employees, departments, by = "dept_id", all.x = TRUE)

# Inner join
result <- merge(employees, departments, by = "dept_id")

# Full join
result <- merge(employees, departments, by = "dept_id", all = TRUE)

With data.table

For large datasets, data.table is extremely fast:

library(data.table)

employees_dt <- as.data.table(employees)
departments_dt <- as.data.table(departments)

# Left join
result <- employees_dt[departments_dt, on = "dept_id"]

# Inner join
result <- merge(employees_dt, departments_dt, by = "dept_id")

Join Types Summary

Join TypeKeeps
left_joinAll left rows, matching right
right_joinAll right rows, matching left
inner_joinOnly matching rows
full_joinAll rows from both

Common Issues

Column Name Conflicts

When tables share column names besides the join key:

result <- employees %>%
  left_join(departments, by = "dept_id", suffix = c("_emp", "_dept"))

Duplicate Keys

If join keys are not unique, you get a many-to-many join:

# Check for duplicates before joining
table(duplicated(employees$dept_id))

NA Values in Join Keys

NA values never match in joins:

# Filter out NAs first or use join_by() with na_matches = "never"
result <- employees %>%
  filter(!is.na(dept_id)) %>%
  left_join(departments, by = "dept_id")

See Also