rguides

Benchmarking R Code — Learn to measure and compare R code

Benchmarking is essential for identifying performance bottlenecks and comparing alternative implementations in R. While you could use Sys.time() for rough timing, the microbenchmark package provides nanosecond-precision measurements that capture meaningful differences between code variants.

Installing microbenchmark

The microbenchmark package is available on CRAN and installs like any other R package:

install.packages("microbenchmark")
library(microbenchmark)

How microbenchmark works

The microbenchmark() function runs your R expressions multiple times and records how long each execution takes. By default, it runs each expression 100 times, which gives you statistically meaningful results without taking excessive time.

Here’s the basic syntax:

microbenchmark(
  expression_name = some_r_code(),
  another_expression = alternative_code(),
  times = 100L  # number of iterations
)

The function returns a microbenchmark object that you can print directly or plot for visual comparison.

Your first benchmark: custom vs built-in median()

Let’s start with a practical example. Imagine you’ve written a custom median function and want to compare it to R’s built-in median():

custom_median <- function(x) {
  sorted_x <- sort(x)
  n <- length(sorted_x)
  
  if (n %% 2 == 1) {
    return(sorted_x[(n + 1) / 2])
  } else {
    return((sorted_x[n / 2] + sorted_x[n / 2 + 1]) / 2)
  }
}

# Create test data
set.seed(42)
random_ints <- sample(1:1000, 500, replace = TRUE)

# Verify both produce the same result
all.equal(custom_median(random_ints), median(random_ints))
# [1] TRUE

# Run the benchmark
bench_median <- microbenchmark(
  custom_median = custom_median(random_ints),
  built_in_median = median(random_ints)
)

print(bench_median)

Typical output looks like:

Unit: microseconds
            expr      min       lq     mean   median       uq      max
 custom_median  150.212  165.103  180.456  172.345  188.234  350.123
 built_in_median  35.421   38.567   42.891   40.123   44.567  120.345

The built-in median() is consistently faster—about 4-5x in this case. This makes sense since your custom version calls sort() (an O(n log n) operation), while R’s implementation uses optimized C code.

Visualizing results

The microbenchmark package integrates with ggplot2 through autoplot(), making it easy to visualize performance differences:

library(ggplot2)
autoplot(bench_median)

This creates a violin plot showing the distribution of timings across all iterations. You’ll typically see that built-in functions have tighter distributions and lower medians than custom implementations.

Real-World example: reading CSV files

Benchmarking becomes critical when comparing packages that solve the same problem differently. A common scenario is reading CSV files—should you use base R’s read.csv() or data.table::fread()?

library(data.table)
library(microbenchmark)

# Benchmark CSV reading
bench_read <- microbenchmark(
  read_base = read.csv("your_data.csv"),
  read_data_table = fread("your_data.csv"),
  times = 50L
)

print(bench_read)

In benchmarks with millions of rows, fread() is typically 5-10x faster than read.csv(). The difference comes from fread() using multi-threaded file reading and intelligent parsing, while read.csv() is single-threaded and more general-purpose.

Real-World example: data aggregation

Another common bottleneck is data aggregation. Let’s compare dplyr and data.table for grouping and summarizing:

library(dplyr)
library(data.table)

# Assume data is already loaded as data_dt (data.table) and data_dplyr (data.frame)

agg_dplyr <- function(data) {
  data %>%
    group_by(category) %>%
    summarise(
      count = n(),
      mean_value = mean(value, na.rm = TRUE),
      sum_value = sum(value, na.rm = TRUE)
    )
}

agg_datatable <- function(data) {
  data[, .(
    count = .N,
    mean_value = mean(value, na.rm = TRUE),
    sum_value = sum(value, na.rm = TRUE)
  ), by = category]
}

bench_agg <- microbenchmark(
  dplyr_approach = agg_dplyr(data_dplyr),
  datatable_approach = agg_datatable(data_dt),
  times = 10L
)

print(bench_agg)

For large datasets (millions of rows), data.table is often 10-100x faster than dplyr for grouped aggregations. However, for small datasets (thousands of rows), the difference may be negligible.

Interpreting benchmark results

When reading microbenchmark output, focus on the median time rather than the mean. The median is more reliable to outliers caused by system interrupts or garbage collection.

Key metrics:

  • min: Fastest execution time (best case)
  • median: Typical execution time (most reliable)
  • mean: Average (can be skewed by outliers)
  • max: Slowest execution (often system-related)

For decision-making, compare medians. If one approach is consistently 2x faster across multiple runs, that’s a real performance difference worth exploiting. The min time represents the best-case performance with no interference. A large gap between min and median suggests variable background load or JIT compilation effects on the first few iterations.

bench::mark() also reports memory allocations alongside timing. An operation that is fast but allocates many objects triggers more frequent garbage collection, which can slow down surrounding code. The mem_alloc column shows peak memory allocation per iteration, and n_gc shows how many garbage collection cycles occurred.

Best practices for accurate benchmarks

  1. Warm up runs: microbenchmark runs 2 warm-up iterations by default to spin up the CPU from idle states. Don’t disable this unless you’re measuring cold-start performance.

  2. Control the environment: Close other applications and avoid system load. Background processes introduce variability.

  3. Use realistic data: Test with data sizes similar to production. Benchmarks on tiny vectors don’t reflect real-world performance.

  4. Increase iterations for stable results: For quick estimates, 10-100 iterations is fine. For publication or critical decisions, run 1000+ iterations.

  5. Repeat the benchmark: Run your benchmark multiple times to ensure consistency. If results vary wildly, investigate the cause.

Common pitfalls

Comparing apples to oranges: Make sure your compared functions produce identical results. Use all.equal() or identical() to verify before benchmarking.

Benchmarking the wrong thing: If you’re measuring IO, include file loading in the benchmark. If you’re measuring computation, use pre-loaded data.

Ignoring memory allocation: Creating large objects repeatedly adds overhead that may dominate your timing. Consider whether your benchmark reflects realistic usage patterns.

Comparing multiple approaches

You can compare as many expressions as needed:

bench_compare <- microbenchmark(
  base_sapply = sapply(1:1000, function(x) x^2),
  base_vapply = vapply(1:1000, function(x) x^2, numeric(1)),
  purrr_map = purrr::map_dbl(1:1000, ~ .x^2),
  times = 1000L
)

print(bench_compare)

This pattern is useful for comparing multiple packages or approaches to the same problem.

When to use microbenchmark

microbenchmark is ideal for:

  • Comparing two or more functions that do the same thing
  • Measuring the performance impact of code changes
  • Deciding between base R, tidyverse, or data.table approaches
  • Identifying which parts of your code are slowest

For profiling specific functions or understanding where time is spent in complex code, use R’s built-in Rprof() instead.

Avoiding common benchmarking mistakes

Benchmark in a fresh R session to avoid the effect of JIT compilation from previous code. Warm-up iterations (running the function a few times before timing) are handled automatically by microbenchmark() and bench::mark().

Test at realistic data sizes, benchmarks on small inputs often reverse direction at scale. bench::press(n = c(100, 1000, 10000), {bench::mark(...)}) benchmarks across multiple input sizes and plots the scaling behavior.

Avoid measuring the same operation multiple times in a loop inside the benchmark, microbenchmark() handles the repetition. The correct pattern is microbenchmark(expr, times = 1000), not microbenchmark({ for(i in 1:1000) expr }, times = 1).

Memory profiling alongside timing

bench::mark() reports both time and memory allocations in each benchmark run. The mem_alloc column shows peak memory allocation per iteration. High allocation counts trigger garbage collection, which can cause variable timing. A function that allocates less memory tends to be more consistently fast, even if its baseline speed is similar to a function that allocates more.

For profiling allocations specifically, profmem::profmem(expr) traces every memory allocation during execution, showing which functions allocate the most memory. This complements profvis for time profiling, together, they identify both CPU and memory bottlenecks.

Benchmarking at multiple scales

Performance characteristics often depend on input size. A function that is faster at small sizes may be slower at large sizes due to algorithmic complexity differences. bench::press(n = c(10, 100, 1000, 10000), {bench::mark(f1(data(n)), f2(data(n)))}) benchmarks at multiple sizes and returns results in a format that ggplot2 can plot to show the scaling behavior. This reveals whether one approach’s advantage grows, shrinks, or reverses as data size increases.

Conclusion

The microbenchmark package gives you accurate, repeatable performance measurements for R code. Start with simple comparisons to understand the performance characteristics of different approaches, then use those insights to optimize the parts of your code that matter most.

For most workflows, focus on I/O operations (reading/writing files) first—these typically offer the biggest performance gains. Then optimize in-memory computations where needed.


See also