The apply Family in R

· 4 min read · Updated March 10, 2026 · intermediate
iteration apply lapply sapply tapply functional-programming

The apply family is a set of base R functions that let you iterate over vectors, lists, or matrices without writing explicit loops. These functions are fundamental to writing idiomatic R code that is both concise and expressive. This guide covers apply(), lapply(), sapply(), and tapply(), showing when to use each one.

apply() for Matrices

apply() works on matrices or arrays. It takes three main arguments: the data, the margin (1 for rows, 2 for columns), and the function to apply.

# Create a sample matrix
mat <- matrix(1:9, nrow = 3, ncol = 3)
mat
#      [,1] [,2] [,3]
# [1,]    1    4    7
# [2,]    2    5    8
# [3,]    3    6    9

# Sum each row (margin = 1)
apply(mat, 1, sum)
# [1] 12 15 18

# Sum each column (margin = 2)
apply(mat, 2, sum)
# [1]  6 15 24

You can pass additional arguments to the function being applied:

# Calculate mean of each column, removing NAs
mat_with_na <- mat
mat_with_na[1, 1] <- NA
apply(mat_with_na, 2, mean, na.rm = TRUE)
# [1] 2.0 5.0 8.0

The function you pass can be built-in or custom:

# Custom function: range (max - min)
apply(mat, 1, function(x) max(x) - min(x))
# [1] 6 6 6

lapply() for Lists and Vectors

lapply() always returns a list. It takes a vector or list as input and applies a function to each element.

# Create a list of vectors
data_list <- list(
  a = c(1, 2, 3),
  b = c(4, 5, 6),
  c = c(7, 8, 9)
)

# Calculate mean of each element
lapply(data_list, mean)
# $a
# [1] 2
# $b
# [1] 5
# $c
# [1] 8

lapply() is the foundation of many data manipulation workflows in R:

# Load multiple data frames into a list
df1 <- data.frame(x = 1:3, y = c("a", "b", "c"))
df2 <- data.frame(x = 4:6, y = c("d", "e", "f"))
all_df <- list(df1 = df1, df2 = df2)

# Check dimensions of each data frame
lapply(all_df, dim)
# $df1
# [1] 3 2
# $df2
# [1] 3 2

sapply() for Simplified Output

sapply() is a user-friendly wrapper around lapply(). It tries to simplify the output into a vector or matrix when possible.

# With sapply, lists get simplified to vectors
result <- sapply(data_list, mean)
result
# a b c 
# 2 5 8

# Check the type
typeof(result)
# [1] "double"

When simplification is not possible, sapply() falls back to returning a list:

# This returns a list because elements have different lengths
mixed_list <- list(
  vec1 = c(1, 2),
  vec2 = c(3, 4, 5)
)
sapply(mixed_list, summary)

Use sapply() when you want concise code and are comfortable with automatic type conversion. For production code where predictable output matters, lapply() gives you more control.

tapply() for Grouped Operations

tapply() applies a function to subsets of a vector, defined by a grouping factor.

# Sample data: values and groups
scores <- c(85, 92, 78, 88, 95, 72)
group <- c("A", "A", "B", "B", "A", "B")

# Calculate mean score by group
tapply(scores, group, mean)
#    A    B 
# 90.3 81.5

This is equivalent to the group_by() plus summarise() pattern in dplyr:

# Same result using dplyr
df <- data.frame(scores = scores, group = group)
aggregate(scores ~ group, data = df, FUN = mean)

tapply() works with multiple factors for more complex groupings:

# Two grouping factors
treatment <- c("drug", "drug", "placebo", "drug", "placebo", "placebo")
tapply(scores, list(group, treatment), mean)
#      drug placebo
# A   93.5     95.0
# B   78.0     79.5

When to Use Which

Choose the right function based on your data structure and output needs:

  • apply() — matrices or arrays, row/column operations
  • lapply() — vectors or lists, when you need list output
  • sapply() — quick exploratory work with automatic simplification
  • tapply() — grouped statistics by factor levels

Common Patterns

The apply family shines in data analysis workflows:

# Apply a function to multiple columns
numeric_cols <- mtcars[, c("mpg", "disp", "hp")]
lapply(numeric_cols, function(x) round(mean(x), 2))
# $mpg
# [1] 20.09
# $disp
# [1] 230.72
# $hp
# [1] 146.69

See Also

  • purrr::map() — the tidyverse alternative with consistent syntax
  • vapply() — the safer version of sapply() with explicit type specification

Performance Notes

The apply functions are written in C, making them faster than explicit loops in R. However, for most everyday tasks, the performance difference is negligible. Readability should guide your choice.

For large datasets, consider data.table or dplyr for better performance. These packages use optimized C++ code under the hood.

Conclusion

Master the apply family and you will write more expressive R code. Each function serves a specific purpose. Start with lapply() for safety, sapply() for exploration, and apply() for matrices. Use tapply() whenever you need grouped summaries.