How to detect outliers in a vector in R

· 3 min read · Updated March 15, 2026 · beginner
r statistics outliers data-cleaning

Outliers are data points that differ significantly from other observations. Detecting them is crucial for data analysis as they can skew results. This guide covers common methods for outlier detection in R.

With IQR Method

The Interquartile Range (IQR) method identifies outliers as values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR:

values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)

q1 <- quantile(values, 0.25)
q3 <- quantile(values, 0.75)
iqr <- q3 - q1

lower_bound <- q1 - 1.5 * iqr
upper_bound <- q3 + 1.5 * iqr

outliers <- values[values < lower_bound | values > upper_bound]
outliers
# [1] 100

Using boxplot.stats()

The boxplot.stats() function provides a convenient way to find outliers:

values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)

boxplot.stats(values)$out
# [1] 100

This function uses the standard 1.5×IQR rule internally.

With Z-Score Method

Z-scores measure how many standard deviations a point is from the mean:

values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)

z_scores <- scale(values)

outliers <- values[abs(z_scores) > 2]
outliers
# [1] 100

Common thresholds:

  • |z| > 2 — unusual but not extreme
  • |z| > 3 — extremely unusual

With Modified Z-Score (MAD)

The Median Absolute Deviation is more robust to extreme outliers:

values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)

median_val <- median(values)
mad_val <- mad(values)

modified_z <- 0.6745 * (values - median_val) / mad_val

outliers <- values[abs(modified_z) > 3.5]
outliers
# [1] 100

The 3.5 threshold is recommended for modified z-scores.

Detecting Outliers in a Data Frame

Use dplyr to find outliers across a column:

library(dplyr)

df <- data.frame(
  id = 1:10,
  value = c(10, 12, 14, 15, 16, 17, 18, 19, 100, 200)
)

find_outliers <- function(x) {
  q1 <- quantile(x, 0.25)
  q3 <- quantile(x, 0.75)
  iqr <- q3 - q1
  x < (q1 - 1.5 * iqr) | x > (q3 + 1.5 * iqr)
}

df |>
  filter(find_outliers(value))
#   id value
# 1  9   100
# 2 10   200

Visualizing Outliers

Boxplot

values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)

boxplot(values, main = "Boxplot with Outlier")

With ggplot2

library(ggplot2)

df <- data.frame(
  id = 1:10,
  value = c(10, 12, 14, 15, 16, 17, 18, 19, 100, 200)
)

ggplot(df, aes(y = value)) +
  geom_boxplot() +
  geom_point(aes(x = 0), size = 3, color = "red") +
  labs(title = "Boxplot Showing Outliers")

Removing Outliers

Filter out outliers from your data:

values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)

q1 <- quantile(values, 0.25)
q3 <- quantile(values, 0.75)
iqr <- q3 - q1

clean_values <- values[values >= (q1 - 1.5 * iqr) & values <= (q3 + 1.5 * iqr)]
clean_values
# [1] 10 12 14 15 16 17 18 19

See Also