How to detect outliers in a vector in R
· 3 min read · Updated March 15, 2026 · beginner
r statistics outliers data-cleaning
Outliers are data points that differ significantly from other observations. Detecting them is crucial for data analysis as they can skew results. This guide covers common methods for outlier detection in R.
With IQR Method
The Interquartile Range (IQR) method identifies outliers as values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR:
values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)
q1 <- quantile(values, 0.25)
q3 <- quantile(values, 0.75)
iqr <- q3 - q1
lower_bound <- q1 - 1.5 * iqr
upper_bound <- q3 + 1.5 * iqr
outliers <- values[values < lower_bound | values > upper_bound]
outliers
# [1] 100
Using boxplot.stats()
The boxplot.stats() function provides a convenient way to find outliers:
values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)
boxplot.stats(values)$out
# [1] 100
This function uses the standard 1.5×IQR rule internally.
With Z-Score Method
Z-scores measure how many standard deviations a point is from the mean:
values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)
z_scores <- scale(values)
outliers <- values[abs(z_scores) > 2]
outliers
# [1] 100
Common thresholds:
- |z| > 2 — unusual but not extreme
- |z| > 3 — extremely unusual
With Modified Z-Score (MAD)
The Median Absolute Deviation is more robust to extreme outliers:
values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)
median_val <- median(values)
mad_val <- mad(values)
modified_z <- 0.6745 * (values - median_val) / mad_val
outliers <- values[abs(modified_z) > 3.5]
outliers
# [1] 100
The 3.5 threshold is recommended for modified z-scores.
Detecting Outliers in a Data Frame
Use dplyr to find outliers across a column:
library(dplyr)
df <- data.frame(
id = 1:10,
value = c(10, 12, 14, 15, 16, 17, 18, 19, 100, 200)
)
find_outliers <- function(x) {
q1 <- quantile(x, 0.25)
q3 <- quantile(x, 0.75)
iqr <- q3 - q1
x < (q1 - 1.5 * iqr) | x > (q3 + 1.5 * iqr)
}
df |>
filter(find_outliers(value))
# id value
# 1 9 100
# 2 10 200
Visualizing Outliers
Boxplot
values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)
boxplot(values, main = "Boxplot with Outlier")
With ggplot2
library(ggplot2)
df <- data.frame(
id = 1:10,
value = c(10, 12, 14, 15, 16, 17, 18, 19, 100, 200)
)
ggplot(df, aes(y = value)) +
geom_boxplot() +
geom_point(aes(x = 0), size = 3, color = "red") +
labs(title = "Boxplot Showing Outliers")
Removing Outliers
Filter out outliers from your data:
values <- c(10, 12, 14, 15, 16, 17, 18, 19, 100)
q1 <- quantile(values, 0.25)
q3 <- quantile(values, 0.75)
iqr <- q3 - q1
clean_values <- values[values >= (q1 - 1.5 * iqr) & values <= (q3 + 1.5 * iqr)]
clean_values
# [1] 10 12 14 15 16 17 18 19
See Also
- base::sd() — Standard deviation calculation
- base::range() — Range of values
- How to Calculate Correlation in R — Related statistical measures