How to merge data frames with different key names in R
When merging data frames in R, the columns you want to join on often have different names. This guide shows you how to handle mismatched column names in dplyr, base R, and data.table.
Sample Data
Create two data frames with different key column names:
# First data frame: employees have a dept_id
employees <- data.frame(
id = c(1, 2, 3, 4),
name = c("Alice", "Bob", "Carol", "David"),
dept_id = c(10, 20, 10, 30)
)
# Second data frame: departments use department_id
departments <- data.frame(
department_id = c(10, 20, 30),
department = c("Engineering", "Sales", "Marketing")
)
The join key is dept_id in employees but department_id in departments.
With dplyr
Use the by parameter with a named character vector:
library(dplyr)
result <- employees %>%
left_join(departments, by = c("dept_id" = "department_id"))
The syntax c(“dept_id” = “department_id”) tells dplyr to match dept_id from the left table to department_id in the right table.
Output
# id name dept_id department
# 1 1 Alice 10 Engineering
# 2 2 Bob 20 Sales
# 3 3 Carol 10 Engineering
# 4 4 David 30 Marketing
All dplyr Join Types
All join functions accept the same by syntax:
# Inner join
inner <- employees %>%
inner_join(departments, by = c("dept_id" = "department_id"))
# Full outer join
full <- employees %>%
full_join(departments, by = c("dept_id" = "department_id"))
# Right join
right <- employees %>%
right_join(departments, by = c("dept_id" = "department_id"))
Multiple Keys with Different Names
When joining on multiple columns that have different names:
# Another example with two keys
orders <- data.frame(
order_id = c(101, 102, 103),
customer_id = c(1, 2, 1),
amount = c(100, 200, 150)
)
customers <- data.frame(
cust_id = c(1, 2, 3),
name = c("Alice", "Bob", "Carol")
)
# Join on both customer_id = cust_id
result <- orders %>%
left_join(customers, by = c("customer_id" = "cust_id"))
With Base R
The merge() function handles different column names with the by.x and by.y parameters:
# Left join equivalent
result <- merge(
employees,
departments,
by.x = "dept_id",
by.y = "department_id",
all.x = TRUE
)
# Inner join
result <- merge(
employees,
departments,
by.x = "dept_id",
by.y = "department_id"
)
# Full outer join
result <- merge(
employees,
departments,
by.x = "dept_id",
by.y = "department_id",
all = TRUE
)
Base R Merge Options
| Parameter | Value | Effect |
|---|---|---|
| all.x | TRUE | Left join |
| all.y | TRUE | Right join |
| all | TRUE | Full outer join |
| all | FALSE (default) | Inner join |
With data.table
Use the on parameter with column names:
library(data.table)
employees_dt <- as.data.table(employees)
departments_dt <- as.data.table(departments)
# Left join
result <- employees_dt[departments_dt, on = c("dept_id" = "department_id")]
# Inner join
result <- merge(
employees_dt,
departments_dt,
by.x = "dept_id",
by.y = "department_id"
)
The on parameter accepts named character vectors similar to dplyr.
Common Problems
Column Name Conflicts
When both tables have columns with the same name (other than the join key):
# Both have a name column - use suffix to differentiate
result <- employees %>%
left_join(
departments,
by = c("dept_id" = "department_id"),
suffix = c("_emp", "_dept")
)
Missing Matches
Check for unmatched rows after joining:
# Find rows in employees with no matching department
unmatched <- employees %>%
anti_join(departments, by = c("dept_id" = "department_id"))
# This tells you which dept_id values do not exist in departments
Type Mismatch
Make sure key columns are the same type:
# If dept_id is character but department_id is numeric
employees$dept_id <- as.integer(employees$dept_id)
# Or convert in the join
result <- employees %>%
left_join(
departments,
by = c("dept_id" = "department_id")
)
See Also
- left_join() — Full join documentation
- how-to-join-two-data-frames — Joining by matching column names
- how-to-group-and-summarise — Data transformation after joining