How to calculate correlation between two columns in R
· 3 min read · Updated March 14, 2026 · beginner
r statistics correlation data-analysis
Correlation measures the linear relationship between two variables. This guide covers Pearson, Spearman, and Kendall methods in R.
With base R: cor()
The base R cor() function handles all common correlation types:
df <- data.frame(
height = c(150, 160, 170, 180, 175, 165, 155, 185),
weight = c(50, 60, 65, 80, 75, 70, 55, 90)
)
# Pearson correlation (default)
cor(df$height, df$weight)
# [1] 0.985
Correlation Methods
Pearson (linear relationship)
The Pearson correlation is the standard choice for continuous data:
# Pearson (linear)
cor(df$height, df$weight, method = "pearson")
# [1] 0.985
Spearman (rank-based)
Use Spearman for non-normal data or ordinal variables:
# Spearman rank correlation
cor(df$height, df$weight, method = "spearman")
# [1] 1
Kendall (concordant pairs)
Best for small datasets with many tied values:
# Kendalls tau
cor(df$height, df$weight, method = "kendall")
# [1] 0.953
Handling Missing Values
The use parameter controls how missing values are handled:
df <- data.frame(
x = c(1, 2, NA, 4, 5, 6),
y = c(2, 4, 6, NA, 8, 10)
)
# Default: pairwise deletion
cor(df$x, df$y, use = "pairwise.complete.obs")
# [1] 1
# Listwise deletion
cor(df$x, df$y, use = "complete.obs")
# [1] 1
Common use options:
- “pairwise.complete.obs” — use all non-missing pairs
- “complete.obs” — listwise deletion
- “everything” — return NA if any missing (default)
With data.frame
Calculate correlation matrix for multiple columns:
df <- data.frame(
height = c(150, 160, 170, 180, 175, 165, 155, 185),
weight = c(50, 60, 65, 80, 75, 70, 55, 90),
age = c(25, 30, 35, 40, 38, 32, 28, 45)
)
# Correlation matrix
cor(df)
# height weight age
# height 1.0000000 0.9853246 0.9746794
# weight 0.9853246 1.0000000 0.9617605
# age 0.9746794 0.9617605 1.0000000
With dplyr and tidyr
Create a correlation matrix in tidy format:
library(dplyr)
library(tidyr)
df <- data.frame(
height = c(150, 160, 170, 180, 175, 165, 155, 185),
weight = c(50, 60, 65, 80, 75, 70, 55, 90),
age = c(25, 30, 35, 40, 38, 32, 28, 45)
)
cor_matrix <- cor(df)
# Convert to tidy format
as.data.frame(cor_matrix) %>%
mutate(var1 = rownames(cor_matrix)) %>%
pivot_longer(-var1, names_to = "var2", values_to = "correlation")
With Hmisc (p-values)
Get correlation with p-values for significance testing:
# Install if needed: install.packages("Hmisc")
library(Hmisc)
df <- data.frame(
height = c(150, 160, 170, 180, 175, 165, 155, 185),
weight = c(50, 60, 65, 80, 75, 70, 55, 90)
)
rcorr(as.matrix(df))
Visualizing Correlation
Base R
plot(df$height, df$weight,
xlab = "Height (cm)",
ylab = "Weight (kg)",
main = "Height vs Weight")
With ggplot2
library(ggplot2)
ggplot(df, aes(x = height, y = weight)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Height vs Weight Correlation")
See Also
- base::var() — Variance calculation
- base::sd() — Standard deviation
- Descriptive Statistics in R — Summary statistics guide