tapply()

tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
Returns: array or list · Updated March 13, 2026 · Base Functions
statistics aggregation group base

tapply() applies a function to subsets of a vector defined by a factor or list of factors. It’s the base R equivalent of dplyr::group_by() %>% summarise() for simple grouped operations.

Syntax

tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

Parameters

ParameterTypeDefaultDescription
XvectorThe vector to process
INDEXfactor or listGrouping variable(s)
FUNfunctionNULLFunction to apply to each group
...anyAdditional arguments passed to FUN
defaultanyNAValue for empty groups when simplify=TRUE
simplifylogicalTRUEIf TRUE, returns array; if FALSE, returns list

Examples

Basic usage

# Sample data: values grouped by category
values <- c(10, 20, 30, 40, 50)
group <- factor(c("A", "A", "B", "B", "B"))

tapply(values, group, sum)
#  A  B 
# 30 120

With multiple grouping variables

# Create data frame
df <- data.frame(
  score = c(85, 92, 78, 88, 95, 72),
  gender = c("M", "M", "F", "F", "M", "F"),
  dept = c("Sales", "Sales", "Sales", "IT", "IT", "IT")
)

# Mean score by gender and department
tapply(df$score, list(df$gender, df$dept), mean)
#       Sales  IT
# F      78.0  80
# M      88.5  95

Using simplify = FALSE

# Returns a list instead of matrix
result <- tapply(values, group, sum, simplify = FALSE)
result
# $A
# [1] 30
# $B
# [1] 120

# Access individual elements
result[["A"]]
# [1] 30

With custom function

# Get range (max - min) by group
tapply(values, group, function(x) diff(range(x)))
# A  B 
# 10  20

# Count NAs per group (using na.rm in sum)
x <- c(1, 2, NA, 4, NA, 6)
g <- factor(c("A", "A", "A", "B", "B", "B"))
tapply(x, g, function(v) sum(is.na(v)))
# A B 
# 2 1

Common Patterns

Computing summary statistics by group

# Create sample dataset
set.seed(123)
df <- data.frame(
  treatment = sample(c("control", "treatment"), 100, replace = TRUE),
  outcome = rnorm(100)
)

# Multiple statistics
tapply(df$outcome, df$treatment, summary)

Conditional aggregation

# Sum values above threshold per group
values <- c(1, 5, 3, 8, 2, 9)
group <- c("A", "A", "B", "B", "B", "B")
tapply(values, group, function(x) sum(x[x > 3]))
# A  B 
# 5 17

See Also