How to calculate correlation between two columns in R

· 3 min read · Updated March 14, 2026 · beginner
r statistics correlation data-analysis

Correlation measures the linear relationship between two variables. This guide covers Pearson, Spearman, and Kendall methods in R.

With base R: cor()

The base R cor() function handles all common correlation types:

df <- data.frame(
  height = c(150, 160, 170, 180, 175, 165, 155, 185),
  weight = c(50, 60, 65, 80, 75, 70, 55, 90)
)

# Pearson correlation (default)
cor(df$height, df$weight)
# [1] 0.985

Correlation Methods

Pearson (linear relationship)

The Pearson correlation is the standard choice for continuous data:

# Pearson (linear)
cor(df$height, df$weight, method = "pearson")
# [1] 0.985

Spearman (rank-based)

Use Spearman for non-normal data or ordinal variables:

# Spearman rank correlation
cor(df$height, df$weight, method = "spearman")
# [1] 1

Kendall (concordant pairs)

Best for small datasets with many tied values:

# Kendalls tau
cor(df$height, df$weight, method = "kendall")
# [1] 0.953

Handling Missing Values

The use parameter controls how missing values are handled:

df <- data.frame(
  x = c(1, 2, NA, 4, 5, 6),
  y = c(2, 4, 6, NA, 8, 10)
)

# Default: pairwise deletion
cor(df$x, df$y, use = "pairwise.complete.obs")
# [1] 1

# Listwise deletion
cor(df$x, df$y, use = "complete.obs")
# [1] 1

Common use options:

  • “pairwise.complete.obs” — use all non-missing pairs
  • “complete.obs” — listwise deletion
  • “everything” — return NA if any missing (default)

With data.frame

Calculate correlation matrix for multiple columns:

df <- data.frame(
  height = c(150, 160, 170, 180, 175, 165, 155, 185),
  weight = c(50, 60, 65, 80, 75, 70, 55, 90),
  age = c(25, 30, 35, 40, 38, 32, 28, 45)
)

# Correlation matrix
cor(df)
#            height    weight       age
# height  1.0000000  0.9853246  0.9746794
# weight  0.9853246  1.0000000  0.9617605
# age     0.9746794  0.9617605  1.0000000

With dplyr and tidyr

Create a correlation matrix in tidy format:

library(dplyr)
library(tidyr)

df <- data.frame(
  height = c(150, 160, 170, 180, 175, 165, 155, 185),
  weight = c(50, 60, 65, 80, 75, 70, 55, 90),
  age = c(25, 30, 35, 40, 38, 32, 28, 45)
)

cor_matrix <- cor(df)

# Convert to tidy format
as.data.frame(cor_matrix) %>%
  mutate(var1 = rownames(cor_matrix)) %>%
  pivot_longer(-var1, names_to = "var2", values_to = "correlation")

With Hmisc (p-values)

Get correlation with p-values for significance testing:

# Install if needed: install.packages("Hmisc")
library(Hmisc)

df <- data.frame(
  height = c(150, 160, 170, 180, 175, 165, 155, 185),
  weight = c(50, 60, 65, 80, 75, 70, 55, 90)
)

rcorr(as.matrix(df))

Visualizing Correlation

Base R

plot(df$height, df$weight,
     xlab = "Height (cm)",
     ylab = "Weight (kg)",
     main = "Height vs Weight")

With ggplot2

library(ggplot2)

ggplot(df, aes(x = height, y = weight)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Height vs Weight Correlation")

See Also