Descriptive Statistics in R

March 7, 2026 · 3 min read · Updated March 7, 2026 · beginner

statistics descriptive mean median variance r

Descriptive statistics summarize the main features of a dataset. They give you a quick overview of your data before you dive into more complex analyses. In this tutorial, you’ll learn how to calculate descriptive statistics in R using both base R functions and the tidyverse approach.

What Are Descriptive Statistics?

Descriptive statistics include measures of central tendency (mean, median, mode) and measures of spread (variance, standard deviation, range, quartiles). These metrics help you understand where your data clusters and how much it varies.

Measures of Central Tendency

Mean

The mean is the arithmetic average of a dataset:

# Calculate mean
heights <- c(165, 170, 175, 180, 168, 172, 178, 162, 185, 169)
mean(heights)
# [1] 172.4

Median

The median is the middle value when data is sorted:

median(heights)
# [1] 171

The median is more robust to outliers than the mean. If you have extreme values, the median often better represents the “typical” value.

Mode

R doesn’t have a built-in mode function. Here’s how to calculate it:

# Custom mode function
get_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

# Test it
colors <- c("red", "blue", "red", "green", "blue", "blue")
get_mode(colors)
# [1] "blue"

Measures of Spread

Variance and Standard Deviation

Variance measures how far each number is from the mean:

# Variance
var(heights)
# [1] 54.48889

# Standard deviation
sd(heights)
# [1] 7.379086

The standard deviation is the square root of variance. It’s often easier to interpret because it’s in the same units as your data.

Range

The range is the difference between the maximum and minimum:

max(heights) - min(heights)
# [1] 23

# Or use range() to get min and max
range(heights)
# [1] 162 185

Interquartile Range (IQR)

The IQR is the range between the 25th and 75th percentiles:

IQR(heights)
# [1] 10.75

Quartiles and Percentiles

Using quantile()

The quantile() function calculates any percentile:

quantile(heights)
#   0%  25%  50%  75% 100%
# 162.0 165.8 171.0 175.2 185.0

# Specific quartiles
quantile(heights, probs = c(0.25, 0.5, 0.75))
# 25%  50%  75%
# 165.8 171.0 175.2

Using summary()

The summary() function gives you a quick overview:

summary(heights)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#   162.0   165.8   171.0   172.4   175.2   185.0

Grouped Statistics

Using tapply()

Calculate statistics by group:

# Create grouped data
gender <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F")
values <- c(100, 90, 95, 85, 88, 92, 98, 87, 91, 89)

tapply(values, gender, mean)
#      F      M
# 88.25 94.40

Using dplyr

The tidyverse approach:

library(dplyr)

df <- data.frame(
  gender = rep(c("M", "F"), each = 5),
  score = c(100, 95, 88, 91, 98, 90, 85, 92, 87, 89)
)

df %>%
  group_by(gender) %>%
  summarise(
    mean = mean(score),
    median = median(score),
    sd = sd(score),
    n = n()
  )

Handling Missing Values

Many statistical functions have an na.rm parameter to handle missing values:

data_with_na <- c(1, 2, NA, 4, 5, NA, 7)

mean(data_with_na)
# [1] NA

mean(data_with_na, na.rm = TRUE)
# [1] 3.8

# Apply to all at once
colMeans(df, na.rm = TRUE)

Common Gotchas

NA Handling

Always check for missing values before calculating statistics:

# Check for NAs
anyNA(heights)
# [1] FALSE

# Count NAs
sum(is.na(data_with_na))
# [1] 2

Outliers

Use multiple measures to understand your data:

# If data has outliers, compare mean vs median
# Large difference suggests outliers are affecting the mean

# Check for outliers
boxplot.stats(heights)$out
# Returns any outliers

Summary

Use mean() and median() for central tendency
Use var() and sd() for spread
Use quantile() for percentiles
Use summary() for a quick overview
Remember na.rm = TRUE for missing data
Compare multiple measures to understand outliers

Next Steps

Continue with Hypothesis Testing in R to learn how to make inferences from your data.

Practical Examples

Descriptive Stats on Real Data

R comes with built-in datasets you can practice on:

Using fivenum()

For a quick five-number summary:

This returns: minimum, lower-hinge, median, upper-hinge, maximum. It’s similar to quantile() but uses a different algorithm.