Hypothesis Testing in R

March 7, 2026 · 7 min read · Updated March 7, 2026 · intermediate

hypothesis-testing statistics t-test p-value anova

Hypothesis testing is a fundamental statistical method for making data-driven decisions. It allows you to draw conclusions about a population based on sample data by evaluating whether observed results are statistically significant or merely due to random chance. In this tutorial, you’ll learn how to conduct common hypothesis tests in R, interpret p-values, and make informed decisions using significance levels.

Understanding Hypothesis Testing

Every hypothesis test begins with two competing claims about a population parameter:

Null hypothesis (H₀): The default assumption that there is no effect or no difference. It represents the status quo or a baseline claim you seek to disprove.
Alternative hypothesis (H₁): The claim you want to prove true—typically that there is an effect, a difference, or a specific relationship in the population.

For example, if you’re testing whether a new medication lowers blood pressure, the null hypothesis would state that the medication has no effect, while the alternative hypothesis would state that it does lower blood pressure.

The decision to reject or fail to reject the null hypothesis hinges on a p-value, which measures the probability of observing results at least as extreme as your sample data, assuming the null hypothesis is true. A small p-value (typically below a threshold called α, usually 0.05) provides evidence against the null hypothesis, leading you to reject it in favor of the alternative.

One-Sample t-Test

The one-sample t-test determines whether the mean of a single sample differs significantly from a known or hypothesized population mean. This test is useful when you have a baseline comparison value and want to test if your sample comes from a population with a different mean.

When to Use It

Use a one-sample t-test when you have continuous numerical data from one group and want to compare its mean to a specific value. Common applications include:

Testing if a machine fills containers to the claimed volume
Checking if average delivery times meet a service level agreement
Determining if student test scores differ from a national average

Performing the Test in R

R’s built-in stats package contains the t.test() function for performing hypothesis tests. Here’s how to conduct a one-sample t-test:

# Sample data: daily website visits for 30 days
set.seed(42)
website_visits <- c(1050, 980, 1120, 1085, 995, 1030, 1015, 1075, 1100, 1045,
                   1005, 1060, 1080, 1025, 1055, 1035, 1070, 1010, 1090, 1040,
                   1050, 1020, 1065, 1085, 1030, 1045, 1075, 1015, 1050, 1060)

# Test if mean differs from 1000 visits (the target)
result <- t.test(website_visits, mu = 1000)

# View results
result

The output shows the t-statistic, degrees of freedom, p-value, and a 95% confidence interval for the true mean. If the p-value is less than your chosen significance level (α = 0.05), you reject the null hypothesis and conclude the mean significantly differs from 1000.

# Extract specific components
cat("t-statistic:", result$statistic, "\n")
cat("p-value:", result$p.value, "\n")
cat("95% CI:", result$conf.int, "\n")
cat("Sample mean:", result$estimate, "\n")

Two-Sample t-Test

The two-sample t-test compares means between two independent groups to determine if they come from the same population. This is one of the most common statistical tests for A/B testing, comparative studies, and experiments with control and treatment groups.

Independent Samples t-Test

When your samples are independent (no relationship between observations in one group and the other), use an independent samples t-test:

# Simulated data: treatment group vs control group
set.seed(42)
treatment <- c(85, 88, 82, 89, 87, 86, 84, 90, 88, 85,
               87, 86, 83, 89, 85, 88, 84, 87, 86, 89)
control <- c(78, 82, 79, 80, 77, 81, 79, 83, 80, 78,
             79, 82, 77, 81, 80, 79, 78, 82, 80, 81)

# Perform independent two-sample t-test
# Assuming equal variances (can be relaxed with var.equal = FALSE)
result <- t.test(treatment, control, var.equal = TRUE)

result

The test output indicates whether the difference between group means is statistically significant. For a two-tailed test (the default), a p-value below 0.05 suggests the groups have different means. For a one-tailed test where you hypothesize one group is greater, add alternative = "greater" or alternative = "less".

Paired Samples t-Test

When your observations are paired (before/after measurements on the same subjects, or matched pairs), use a paired t-test:

# Before and after treatment measurements for 15 patients
before <- c(142, 138, 145, 140, 137, 143, 139, 141, 136, 144,
            138, 140, 142, 139, 137)
after <- c(132, 130, 135, 128, 131, 134, 130, 133, 129, 136,
           131, 132, 134, 130, 128)

# Paired t-test (same subjects measured twice)
result <- t.test(after, before, paired = TRUE)

result

The paired t-test is more powerful than the independent samples t-test because it accounts for individual differences by comparing each subject to themselves.

Chi-Square Test

The chi-square test analyzes categorical data to determine if there is an association between variables. It compares observed frequencies in categories to expected frequencies under the null hypothesis of independence.

Chi-Square Test of Independence

Use this test when you have contingency table data and want to know whether two categorical variables are related:

# Create a contingency table from survey data
# Example: Preference for feature A vs B by age group
survey_data <- matrix(c(50, 30, 20, 40,
                         35, 45, 30, 25,
                         25, 35, 50, 30),
                       nrow = 3, byrow = TRUE)

rownames(survey_data) <- c("18-25", "26-40", "41+")
colnames(survey_data) <- c("Feature A", "Feature B", "No Preference")

# Perform chi-square test
chi_result <- chisq.test(survey_data)

chi_result

The chi-square statistic measures the overall discrepancy between observed and expected frequencies. A small p-value indicates a significant association between the variables.

# View expected frequencies (under null hypothesis)
chi_result$expected

# Check residuals to see which cells contribute most to the result
chi_result$residuals

ANOVA (Analysis of Variance)

ANOVA extends the t-test to compare means across three or more groups. It tests whether at least one group mean differs significantly from the others while controlling the overall Type I error rate.

One-Way ANOVA

# Simulated data: test scores from three different teaching methods
set.seed(42)
method_a <- rnorm(20, mean = 75, sd = 8)
method_b <- rnorm(20, mean = 80, sd = 9)
method_c <- rnorm(20, mean = 72, sd = 7)

# Combine into a data frame for analysis
scores <- data.frame(
  score = c(method_a, method_b, method_c),
  method = factor(rep(c("A", "B", "C"), each = 20))
)

# Perform one-way ANOVA
anova_result <- aov(score ~ method, data = scores)

# View summary (this is the F-test output)
summary(anova_result)

If the ANOVA shows significance (p < 0.05), it tells you that at least one group differs, but not which ones. To identify specific differences, perform post-hoc pairwise comparisons with Tukey’s HSD:

# Tukey HSD post-hoc test
tukey_result <- TukeyHSD(anova_result)
tukey_result

Interpreting Results and Common Pitfalls

Understanding what hypothesis tests tell you—and don’t tell you—is crucial for proper statistical inference.

Key Points to Remember

The p-value is not the probability that the null hypothesis is true. It measures the compatibility of your data with the null hypothesis. A p-value of 0.03 means there’s a 3% chance of observing data as extreme as yours if the null hypothesis were true—not that there’s a 97% chance the alternative is true.

Statistical significance does not imply practical significance. With large sample sizes, even tiny differences can become statistically significant. Always consider effect size (the magnitude of the difference) alongside p-values.

Multiple comparisons inflate the family-wise error rate. Conducting many tests without correction increases the chance of false positives. Use corrections like Bonferroni or Benjamini-Hochberg when making multiple comparisons.

Assumptions Matter

Most parametric tests assume your data meet certain conditions:

Independence: Observations are unrelated and randomly sampled.
Normality: Data follow a normal distribution (less critical with larger samples due to the central limit theorem).
Homogeneity of variance: Groups have similar variability (can be tested with Levene’s test).

Always check these assumptions before reporting results:

# Check normality with Shapiro-Wilk test
shapiro.test(website_visits)

# Check homogeneity of variance
var.test(treatment, control)

Summary

Hypothesis testing provides a rigorous framework for making decisions based on data. In R, the stats package makes it straightforward to conduct t-tests, chi-square tests, and ANOVA. Remember these key points:

Start with clear hypotheses — define your null and alternative before collecting data.
Choose the right test — one-sample, two-sample, paired, or ANOVA based on your data structure.
Check assumptions — normality, independence, and variance homogeneity affect validity.
Interpret carefully — p-values measure evidence against the null, not probability of truth.
Consider effect size — statistical significance alone doesn’t tell the full story.

With these tools, you can confidently analyze experimental results, evaluate business metrics, and draw valid conclusions from your data in R.