rguides

Testing with testthat

The testthat package is the standard unit testing framework for R. Hadley Wickham built it to make testing feel less like a chore and more like writing documentation that happens to run. If you have ever put stopifnot() calls in your code and wondered if there was a better way, there is.

What you’ll learn

This tutorial covers the key concepts and practical techniques for working with Testing with testthat. By the end, you will know how to apply the core functions in real data analysis workflows.

The AAA pattern

Good tests follow a structure called Arrange-Act-Assert. First you set up some data, then you run the code you want to test, then you check the result. testthat makes this structure explicit through its function names.

test_that("multiplication produces correct results", {
  # Arrange
  x <- 3
  y <- 4

  # Act
  result <- x * y

  # Assert
  expect_equal(result, 12)
})
# [1] TRUE

The string in test_that() reads like a sentence. This is not accident, when a test fails during R CMD check, you see that description in the output. Writing it as a natural language statement makes the failure message meaningful.

Structuring a test

The test_that() function takes two arguments: a description and a code block. Always wrap the code block in braces, even if it contains only one expectation. Without braces, testthat reports failures at the wrong line.

test_that("division handles zero", {
  expect_error(1 / 0, "Inf")
})

Inside the block, you follow Arrange-Act-Assert. Set up your inputs, run the function under test, then assert what you expect. The expectations are the assertions.

Equality checks

expect_equal() is the most common assertion. It tests approximate equality using a tolerance for floating-point differences:

expect_equal(0.1 + 0.2, 0.3, tolerance = 1e-7)
# passes even though 0.1 + 0.2 is not exactly 0.3 in floating point

expect_identical() checks exact equality including type. This is stricter and will fail if an integer and a numeric are compared:

expect_identical(1L, 1L)   # passes
expect_identical(1L, 1)    # fails — different types

Boolean checks

For true/false conditions, expect_true() and expect_false() work directly:

expect_true(is.numeric(5))
expect_false(is.character(5))
expect_null(NULL)

Error, warning, and message checks

When your code is supposed to fail, use expect_error(). Pass a regexp to match the error message:

divide <- function(a, b) {
  if (b == 0) stop("division by zero")
  a / b
}

test_that("division by zero throws", {
  expect_error(divide(1, 0), "division by zero")
  expect_error(divide(1, 0), class = "error")
})

Similarly, expect_warning() checks for warnings and expect_message() checks for messages. In testthat 3rd edition, messages and warnings bubble up rather than being silently suppressed, you must handle them explicitly with suppressMessages(), suppressWarnings(), or the corresponding expect_*() functions.

Structure checks

You can inspect objects without comparing values:

expect_type(1L, "integer")
expect_s3_class(factor("a"), "factor")
expect_length(letters, 26)
expect_named(list(a = 1, b = 2), c("a", "b"))

Snapshot testing

For output that changes infrequently, expect_snapshot() captures the actual output and stores it. On subsequent runs, testthat compares new output against the stored snapshot. This is useful for testing print methods or formatted output:

test_that("print method output is stable", {
  expect_snapshot(print_my_object(x))
  # On first run, creates tests/testthat/_snaps/filename.md
})

Skipping tests

Sometimes a test is correct but cannot run in the current environment. Use skip functions to conditionally skip:

test_that("HTTP endpoint responds", {
  skip_if_offline()
  skip_if_not_installed("httr2")

  response <- httr2::request("https://httpbin.org/get") |>
    httr2::req_perform()
  expect_equal(httr2::resp_status(response), 200)
})

The difference between skip_if() and skip_if_not() is direction: skip_if(condition) skips when the condition is true, while skip_if_not(condition) skips when it is false. skip_if_not_installed("pkg") is the idiomatic way to handle optional dependencies, it is cleaner than wrapping everything in requireNamespace().

Fixtures and cleanup

Setup that runs before every test file goes in tests/testthat/helper-*.R. For per-test setup and automatic teardown, use the withr package. The local_*() functions register a cleanup that runs when the test exits:

library(withr)

test_that("temp file is written correctly", {
  local_file <- local_temp_file()

  writeLines("hello", local_file)
  expect_equal(readLines(local_file), "hello")
})  # temp file deleted automatically when test exits

This is the modern replacement for the now-deprecated setup() and teardown() functions in testthat 3rd edition. Using withr::local_*() is safer because the cleanup runs even if your test throws an error.

testthat 3rd edition

Version 3 of testthat introduced breaking changes that make tests more reliable but require some migration work. Activate it in your DESCRIPTION file:

Config/testthat/edition: 3

Or temporarily within a test file with local_edition(3).

The main changes are that messages and warnings are no longer swallowed silently. Any message() call in your code will now cause test failures unless you handle it explicitly. This is a good thing, it means your package is not producing hidden output you did not know about.

The expect_equal() and expect_identical() functions now use the waldo package for comparison. This makes tolerance handling consistent but can surface differences that 2e ignored, particularly around timezones and factor levels. If you see new failures after upgrading, check whether timezone or factor comparisons are involved.

Common mistakes

Missing braces around the test body. Without braces, testthat reports failures at the wrong line. Always use braces, even for single expectations.

Using expect_identical() for floating-point values. The function uses identical() which requires exact bit-level equality. expect_equal(0.1 + 0.2, 0.3) passes because it uses a tolerance; expect_identical(0.1 + 0.2, 0.3) fails because they are not bit-identical.

Forgetting that options and the working directory persist across tests. Each test gets its own environment for objects, but global options and the current directory are shared. Use withr::local_locale() or withr::local_dir() to isolate changes.

Timezone differences on CI. If tests pass locally but fail on a CI server, set Sys.setenv(TZ = "UTC") in your test helper. Timezone handling changed in waldo and differences that did not matter before may now cause failures.

Test structure

testthat organizes tests in files under tests/testthat/. Each file typically tests one source file. test_that("description", { ... }) groups related expect_*() calls. When a test fails, testthat reports the description, the failing expectation, and the difference between expected and actual values. Run tests with devtools::test() or testthat::test_dir("tests/testthat/").

Writing effective tests

Test behavior, not implementation. “Returns the top 3 elements” is a behavior test; “calls order() internally” is an implementation test. Behavior tests survive refactoring; implementation tests break when you improve the code without changing behavior. Test edge cases explicitly: empty input, single element, duplicates, NA values, negative numbers. Each test_that() block should test one specific scenario with a clear description.

Test fixtures with withr

withr::local_tempfile() creates a temporary file that is deleted after the test. withr::local_options(option = value) sets options for the test scope. withr::with_envvar(c(MY_VAR = "test"), { ... }) sets environment variables. These ensure tests do not leave side effects and do not depend on global state. Clean test state makes failures predictable and independent of test execution order.

Continuous integration

Add a GitHub Actions workflow with usethis::use_github_action("check-standard") to run tests automatically on every push and pull request. The workflow installs dependencies from DESCRIPTION and runs R CMD check which includes all tests. Coverage reports via covr::github_action() track test coverage over time. A test suite that runs in CI on every commit catches regressions before they reach the main branch.

Testing as package infrastructure

Tests in an R package are not optional extras, they are infrastructure that makes future development safe. Without tests, every change to a function requires manually verifying that it still works. With tests, running the test suite confirms that the code behaves as expected after every modification. The R CMD check process that CRAN runs before accepting a package runs your tests, so passing tests is a prerequisite for submission.

The testthat package is the standard testing framework for R packages. Tests live in the tests/testthat directory and are organized into test files. Each test file contains multiple test blocks. Each test block exercises one aspect of one function and contains one or more expectations. Running the full suite with devtools::test() reports passes, failures, and errors with clear messages about what failed and where.

Writing meaningful tests

Tests have the most value when they encode knowledge about the function’s contract — the relationship between inputs and outputs that the function promises to maintain. Tests that check only that a function runs without error do not catch regressions where the function runs but returns a wrong value. Tests that check specific output values for specific inputs pin the behavior and fail when it changes unexpectedly.

Edge cases deserve explicit tests. Empty inputs, NULL inputs, inputs with NA values, zero-length vectors, single-element vectors, and boundary values at the limits of valid ranges are all places where functions commonly have bugs. Listing the edge cases for a function and writing one test per case turns implicit assumptions about input handling into verified behavior.

Test fixtures and helper functions

When multiple tests need the same setup data, define it in a fixture. helper.R files in the tests/testthat directory are sourced before tests run and can define shared objects and helper functions. For expensive setup that should only run once per test session, use withr’s local_* functions or the setup and teardown hooks. Keeping fixture data close to the tests makes it clear what each test depends on.

For functions that interact with external systems — databases, files, APIs — testing requires controlling the external dependency. The withr package provides temporary environments for files and environment variables. The httptest2 package records and replays HTTP interactions for API clients. These tools let you test external-facing code without making real network calls or filesystem changes in the test environment.

Next steps

Now that you understand testing with testthat, explore these related topics to deepen your knowledge and apply these techniques in more complex scenarios.

See also