rguides

ANOVA in R, Learn how to perform Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical method for comparing means across three or more groups. In this tutorial, you’ll learn how to perform one-way ANOVA, two-way ANOVA, check the underlying assumptions, and run post-hoc tests to identify which specific groups differ.

What you’ll learn

This tutorial covers the key concepts and practical techniques for working with ANOVA in R — Learn how to perform Analysis of Variance (ANOVA). By the end, you will know how to apply the core functions in real data analysis workflows.

One-Way ANOVA

One-way ANOVA compares the means of three or more groups that are defined by a single categorical variable. For example, you might compare test scores across three different teaching methods.

The aov() function

R provides the aov() function for ANOVA. Here’s a practical example using plant growth data:

# Load the data (built-in dataset)
data(PlantGrowth)

# Examine the structure
str(PlantGrowth)
# 'data.frame':	30 obs. of  2 variables:
#  $ weight: num  4.17 5.58 5.18 ...
#  $ group : Factor w/ 3 levels "ctrl","trt1","trt2"

# View the groups
unique(PlantGrowth$group)
# [1] ctrl trt1 trt2
# Levels: ctrl trt1 trt2

# Run one-way ANOVA
model <- aov(weight ~ group, data = PlantGrowth)

# View the results
summary(model)
#             Df Sum Sq Mean Sq F value Pr(>F)
# group        2  3.766  1.8832   4.846  0.0159 *
# Residuals   27 10.492  0.3886

The key output is the p-value (Pr(>F)). With p = 0.0159, we reject the null hypothesis that all group means are equal. At least one group differs significantly from the others.

Interpreting the output

The ANOVA table contains several components:

  • Df (Degrees of Freedom): group has k-1 degrees (k = number of groups), residuals have n-k
  • Sum Sq: Sum of squares between groups and within groups (residuals)
  • Mean Sq: Sum of squares divided by degrees of freedom
  • F value: The test statistic (ratio of between-group to within-group variance)
  • Pr(>F): The p-value

A significant p-value tells you something is different, but it doesn’t tell you which groups differ. That’s where post-hoc tests come in.

Checking ANOVA assumptions

ANOVA makes several assumptions that you should verify before trusting the results:

1. independence

This comes from your study design. If measurements in one group affect another, ANOVA isn’t appropriate. Random sampling and assignment help ensure independence.

2. normality

The residuals (or each group’s data) should be approximately normally distributed:

# Check normality with Shapiro-Wilk test
shapiro.test(residuals(model))
# 	Shapiro-Wilk normality test
# data:  residuals(model)
# W = 0.9304, p-value = 0.05685

# Or test each group separately
by(PlantGrowth$weight, PlantGrowth$group, shapiro.test)

For large samples (n > 30 per group), ANOVA is reliable to minor departures from normality due to the Central Limit Theorem.

3. homogeneity of variances

Variances should be equal across groups. Use Levene’s test:

# Install and load car package if needed
# install.packages("car")
library(car)

leveneTest(weight ~ group, data = PlantGrowth)
# Levene's Test for Homogeneity of Variance (center = median)
#        Df F value Pr(>F)
# group   2  0.6931  0.5097

A non-significant p-value (like 0.51) indicates equal variances. If this assumption fails, consider using a Welch’s ANOVA or a Kruskal-Wallis test.

4. no significant outliers

Outliers can distort results. Check with:

# Identify outliers
boxplot(weight ~ group, data = PlantGrowth)

# Or get numeric values
boxplot.stats(PlantGrowth$weight)$out

Two-Way ANOVA

Two-way ANOVA extends the concept to two categorical independent variables. This lets you test the effect of each factor while controlling for the other, plus check for interactions.

Running two-Way ANOVA

# Create a dataset with two factors
# Using mtcars: transmission (am) and cylinders (cyl)
data(mtcars)

# Convert to factors
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
mtcars$cyl <- factor(mtcars$cyl)

# Two-way ANOVA (without interaction)
model_two <- aov(mpg ~ am + cyl, data = mtcars)
summary(model_two)
#             Df  Sum Sq Mean Sq   F value Pr(>F)
# am           1  376.47  376.47   14.759 0.000764 ***
# cyl          2  456.99  228.49    8.964 0.001134 **
# Residuals   28  712.83   25.46

# Two-way ANOVA (with interaction)
model_int <- aov(mpg ~ am * cyl, data = mtcars)
summary(model_int)

The formula am * cyl includes both main effects and the interaction. Use + for main effects only, and * for main effects plus interaction.

Interpreting interaction effects

When you have a significant interaction, the effect of one factor depends on the level of the other. Visualize this:

# Interaction plot
interaction.plot(
  x.factor = mtcars$cyl,
  trace.factor = mtcars$am,
  response = mtcars$mpg,
  xlab = "Cylinders",
  ylab = "Miles per Gallon",
  trace.label = "Transmission"
)

Parallel lines mean no interaction. Non-parallel lines indicate an interaction effect.

Common pitfalls

Confusing statistical and practical significance

A significant p-value doesn’t always mean the difference matters in practice. Always check effect sizes (eta-squared or omega-squared) alongside p-values.

Ignoring assumptions

Running ANOVA without checking assumptions can lead to false conclusions. The test is reasonably reliable to mild violations, but extreme violations require alternative approaches.

Multiple comparisons without adjustment

Running multiple t-tests increases the chance of a false positive. Always use adjusted p-values (like Tukey’s HSD provides) or control the false discovery rate.

Summary

  • Use aov() for one-way and two-way ANOVA in R
  • Check assumptions: independence, normality, homogeneity of variance, no outliers
  • Use TukeyHSD() for post-hoc pairwise comparisons
  • For two-way ANOVA, test for interaction effects with * in the formula
  • Visualize interactions with interaction.plot()

Next steps

Explore Linear Regression in R to extend these concepts to continuous predictors, or Chi-Square Test for comparing categorical variables.

Assumptions

ANOVA assumes: (1) independent observations, (2) normally distributed residuals, (3) equal variance across groups (homoscedasticity). Check normality with shapiro.test(residuals(aov_result)) and a Q-Q plot. Check equal variance with bartlett.test() or car::leveneTest(). For non-normal data or small samples, kruskal.test() is the non-parametric alternative to one-way ANOVA.

ANOVA fundamentals

Analysis of variance (ANOVA) tests whether group means differ significantly. The F-test statistic is the ratio of between-group variance to within-group variance. A large F ratio (with a small p-value) indicates that group membership explains more variance than expected by chance.

aov(y ~ group, data = df) fits a one-way ANOVA. summary(aov_model) shows the F statistic, degrees of freedom, and p-value. Two-way ANOVA: aov(y ~ factor1 + factor2, data = df) tests main effects. aov(y ~ factor1 * factor2, data = df) includes the interaction term.

The interaction effect tests whether the effect of factor1 differs across levels of factor2. Always check interactions before interpreting main effects in two-way ANOVA, a significant interaction means main effects cannot be interpreted in isolation.

Post-Hoc tests

A significant ANOVA F-test shows that at least one group differs but does not identify which groups. Post-hoc tests compare all pairwise combinations with correction for multiple comparisons.

TukeyHSD(aov_model) performs Tukey’s Honest Significant Difference test. It controls the family-wise error rate for all pairwise comparisons. plot(TukeyHSD(aov_model)) visualizes the confidence intervals, non-overlapping intervals indicate significant differences.

pairwise.t.test(y, group, p.adjust.method = "bonferroni") applies Bonferroni correction. p.adjust.method = "holm" uses Holm’s method, which is less conservative than Bonferroni while controlling family-wise error. p.adjust.method = "fdr" controls the false discovery rate, appropriate when many comparisons are made and some false positives are acceptable.

Checking assumptions

ANOVA assumes: (1) normality of residuals, (2) homogeneity of variance (homoscedasticity) across groups, (3) independence of observations.

shapiro.test(residuals(aov_model)) tests residual normality. For large samples, QQ plots are more informative than formal tests: qqnorm(residuals(aov_model)); qqline(residuals(aov_model)). ANOVA is reliable to mild non-normality with large, balanced groups.

bartlett.test(y ~ group, data = df) tests equality of variances. leveneTest(y ~ group, data = df) (from car package) is more reliable to non-normality. Welch’s ANOVA oneway.test(y ~ group, data = df) does not assume equal variances.

Non-Parametric alternatives

When ANOVA assumptions are violated, use non-parametric tests. kruskal.test(y ~ group, data = df) tests whether groups differ in location without normality assumptions. It is the non-parametric analog of one-way ANOVA. For post-hoc: dunn.test::dunn.test(y, group) performs pairwise Mann-Whitney tests with multiple comparison correction.

vegan::adonis2(dist(y) ~ group, data = df) is a permutation-based multivariate ANOVA that makes no distributional assumptions at all. It is appropriate for multivariate responses and non-normal data, commonly used in ecology and genetics.

Repeated measures ANOVA

When the same subjects are measured multiple times, standard ANOVA violates the independence assumption. Repeated measures ANOVA accounts for within-subject correlation.

aov(y ~ time * treatment + Error(subject/time), data = df) specifies a repeated measures design in base R. The Error() term identifies the within-subject factor. lme4::lmer(y ~ time * treatment + (1 | subject), data = df) fits a linear mixed model, a more flexible approach that handles unbalanced designs and continuous time variables.

Interpreting ANOVA results

The ANOVA F-statistic tests whether any group means differ significantly from the overall mean. A significant F-test tells you that at least one group differs, but not which groups or by how much. This is why post-hoc tests follow a significant ANOVA — they identify the specific pairwise differences.

Effect size measures complement the significance test. Eta-squared (η²) and partial eta-squared measure the proportion of variance explained by the factor. Cohen’s conventions describe small (0.01), medium (0.06), and large (0.14) eta-squared values. Reporting effect sizes alongside p-values is increasingly expected in scientific publications because they communicate practical significance, not just statistical significance.

ANOVA assumptions and violations

ANOVA assumes independence of observations, normality of residuals within each group, and homogeneity of variance across groups. The independence assumption must be met by study design — ANOVA cannot correct for correlated observations. The normality assumption can be relaxed for large samples due to the central limit theorem. The homogeneity of variance assumption is testable with Levene’s test.

When variance homogeneity is violated, Welch’s ANOVA provides a correction. For non-normal data that does not become normal with transformation, Kruskal-Wallis serves as the non-parametric alternative. When the study design involves repeated measures — the same subjects measured multiple times — repeated measures ANOVA accounts for the within-subject correlation that standard ANOVA ignores.