rguides

Profiling Optimizing R Code: Tools and Techniques

Profiling optimizing R code begins with identifying what is actually making it slow: before changing any line, you must measure where runtime is spent. This guide shows you how to use Rprof, profvis, and bench to pinpoint bottlenecks, then apply targeted optimizations using microbenchmarking to compare alternative implementations.

Why profile first

Donald Knuth famously said that programmers waste enormous amounts of time optimizing noncritical parts of their programs. Profiling tells you exactly where your code spends time, so you can focus on the parts that actually matter.

R uses a sampling profiler. It stops code execution every few milliseconds and records the current call stack. Over time, this builds a picture of which functions consume the most time.

Using profvis for profiling

The profvis package provides the easiest way to profile R code. It wraps the base R profiler and displays results in an interactive visualization. The base R profiler is called Rprof(), and it writes sampling data to a file. profvis takes that data and makes it readable.

You can also use Rprof directly:

Rprof(tmp <- tempfile(), interval = 0.01)
# Run your code here
your_function()
Rprof(NULL)

# Summarize results
summaryRprof(tmp)

The summaryRprof() output lists each function with its total and self-time, sorted by time consumed on the call stack. However, this flat tabular format makes it difficult to understand the calling context. Running the same profiling data through profvis instead produces an interactive flame graph that maps the full call hierarchy, showing which functions are expensive, which parent invoked them, and what proportion of runtime each child branch consumed. The flame graph encodes this relationship through nested horizontal bars, making it immediately obvious whether the bottleneck is in a leaf function or in a parent that calls many cheap children.

# Install profvis if needed
install.packages("profvis")

# Profile a function
library(profvis)

profile_slow <- function() {
  # Simulate some work
  result <- numeric(1000)
  for (i in 1:1000) {
    result[i] <- mean(rnorm(1000))
  }
  # More work
  data.frame(
    x = rnorm(10000),
    y = runif(10000)
  )
}

profvis(profile_slow())

When you run this, profvis opens an HTML page with two panes. The top shows your source code with bar graphs overlaid on each line. The bottom shows a flame graph of the call stack.

The flame graph reveals which functions called which, and how much time each branch consumed. If you see <GC> taking significant time, your code creates too many temporary objects.

Memory profiling

Memory problems show up clearly in profvis. The memory column shows allocation (right bar) and deallocation (left bar) for each line.

Watch for patterns like this:

# This creates many copies - slow!
x <- integer()
for (i in 1:10000) {
  x <- c(x, i)  # Creates a new copy each time
}

When you profile the first approach, the garbage collector (<GC>) dominates the flame graph because R copies the entire growing vector on each iteration, allocating and deallocating memory for every append. The output will show large memory-allocation bars alongside c() calls, confirming the copy-on-write penalty. The two alternatives below avoid this entirely. Pre-allocation reserves the full vector upfront so assignment into existing slots triggers no copying. The vectorized 1:10000 constructor eliminates the loop altogether, creating the result in a single allocation. In profvis, both alternatives show negligible <GC> activity and a dramatic drop in total runtime, confirming you have correctly identified and fixed the bottleneck.

# Fast: pre-allocate
x <- integer(10000)
for (i in 1:10000) {
  x[i] <- i
}

# Even faster: vectorized
x <- 1:10000

Microbenchmarking with bench

Microbenchmarking measures the execution time of small code snippets with high precision, serving a different purpose from memory profiling: while profvis shows you where your program spends time and memory across an entire run, bench compares the raw speed of alternative implementations for a specific operation. The bench package is the modern choice for this task, automatically determining how many iterations to run for statistically meaningful results and tracking memory allocation alongside timing so you can evaluate both speed and space efficiency in a single call.

install.packages("bench")

library(bench)
x <- runif(100)

result <- bench::mark(
  sqrt(x),
  x ^ 0.5,
  exp(log(x) / 2)
)

result

The output shows min, median, and mean times plus iterations per second and a memory allocation column. Focus on the minimum (best possible) and median (typical) times rather than the mean, since benchmark distributions are heavily skewed. The full result table includes many columns beyond these essentials, such as garbage collection counts, iteration counts, and per-iteration timing breakdowns. When you only care about comparing which expression is fastest and how much memory it uses, subsetting to the five key columns shown next keeps the output readable while still giving you the metrics that drive optimization decisions.

# Focus on the important columns
result[, c("expression", "min", "median", "itr/sec", "mem_alloc")]

The distribution of timings is heavily right-skewed because the operating system occasionally interrupts execution for background tasks, producing a long tail of slow outliers. This is why comparing means is misleading: the arithmetic mean gets inflated by a handful of unlucky iterations, while the median remains stable.

Common optimization patterns

Once profiling identifies bottlenecks, these patterns often help, and microbenchmarking with bench confirms whether each pattern actually delivers a measurable improvement for your specific data size and hardware.

Vectorize over loops. Replace explicit loops with vectorized operations written in R’s fast C internals, which avoid the per-iteration interpreter dispatch overhead that dominates tight R-level loops:

# Slow
sum <- 0
for (i in x) {
  sum <- sum + i
}

# Fast
sum(x)

Pre-allocate. Vectorization works when a function already accepts a vector argument, but many custom calculations require a loop because each iteration depends on the previous result or calls a function with no vectorized equivalent. In these cases, allocating the full result vector before the loop eliminates repeated copying. Pre-allocation tells R upfront how much memory to reserve, so each assignment into an existing slot modifies the vector in place rather than building a new one element by element. The difference in profvis is stark: instead of a growing sawtooth of allocation calls, you see a single allocation at the start followed by a loop with no memory churn:

# Slow - keeps growing
result <- c()
for (i in 1:1000) {
  result <- c(result, compute(i))
}

# Fast - fixed size
result <- numeric(1000)
for (i in 1:1000) {
  result[i] <- compute(i)
}

Use matrices efficiently. Data frames store each column as a separate vector with individual type attributes and row names, which adds per-column metadata overhead that matrices avoid. When your data is purely numeric, converting to a matrix strips this bookkeeping and allows R’s underlying BLAS routines to operate on contiguous memory. Functions like colMeans(), rowSums(), and %*% run significantly faster on matrices because they dispatch directly to optimized C or Fortran linear algebra libraries without checking column types on every access. The performance gain increases with the number of columns, since data frame overhead compounds per column:

# Convert to matrix for numeric-heavy work
mat <- as.matrix(df[, c("a", "b", "c")])
colMeans(mat)

Avoid copies. The copy-on-modify behavior in R can surprise you because operations that look like in-place mutations often trigger deep copies of the entire object. R defers copying until a modification actually occurs, a strategy called copy-on-write, but once a copy is triggered, the entire object is duplicated. This means a single innocent-looking assignment such as removing a column from a data frame actually allocates a fresh copy with all remaining columns. The memory column in bench or profvis will reveal this hidden allocation, and the time impact grows with object size. When you need to remove multiple columns, extracting the ones you want to keep via subsetting avoids repeated full-object copies:

# This doesn't modify x in place
x <- data.frame(a = 1:10)
x$a <- NULL  # Actually creates a new data frame

Use vectorized alternatives to apply(). colSums(m) is faster than apply(m, 2, sum). Replace apply() with vectorized operations wherever the equivalent exists, since apply() essentially runs an R-level loop with per-element function dispatch overhead.

Choose the right indexing method. Use which() instead of logical indexing when you need indices and the logical vector has many FALSE values, since which() returns only the positions of TRUE values without scanning the full logical vector for each subsequent operation.

Match the tool to the data format. For I/O bottlenecks: data.table::fread() reads CSV files 5-10x faster than read.csv(). arrow::read_parquet() is faster still for columnar data. For large model matrices, Matrix::sparse.model.matrix() avoids dense matrix allocation. For text operations, stringi functions are generally faster than stringr equivalents.

When to use which tool

  • profvis - Find which parts of your code are slow
  • bench - Compare multiple implementations precisely
  • gc() - Check current memory usage
  • object.size() - See how much memory an object uses

Profiling limitations

Be aware of what profiling cannot tell you:

  • It cannot profile C/C++ code called from R
  • Anonymous functions are hard to identify in call stacks
  • Lazy evaluation can make call stacks harder to interpret

Measuring execution time

For quick checks, system.time() works for single expressions:

system.time({
  x <- rnorm(1000000)
  mean(x)
})

The output shows user time (CPU time), system time (OS calls), and elapsed time. For longer-running code, this gives you a rough sense of performance. However, system.time is not precise enough for comparing similar implementations, which is why bench exists.

Practical example

Here’s a realistic workflow:

library(profvis)

# Your actual analysis code
analyze_data <- function() {
  # Load and process data
  df <- read.csv("large_file.csv")

  # Transformations
  df$z <- (df$x - mean(df$x)) / sd(df$x)

  # Model fitting
  results <- lapply(unique(df$group), function(g) {
    subset <- df[df$group == g, ]
    lm(y ~ z, data = subset)
  })

  # Summaries
  lapply(results, summary)
}

profvis(analyze_data())

The profvis output shows you which parts take the most time. Maybe the file read is slow (consider readr). Maybe the loop over groups creates too many copies (consider dplyr group_by). You won’t know until you profile.

Reading profvis output

The profvis call tree shows where time is spent. Each row is a function call, indented to show the call stack. The width of the bar corresponds to execution time. Click a row to expand and see the source code for that function. Wide bars near the bottom of the stack (inner functions) indicate the actual bottleneck; wide bars near the top indicate that a parent function is slow because its children are slow.

The Data tab shows the raw flame graph data. Export as HTML to share a profile with colleagues: htmlwidgets::saveWidget(profvis(expr), "profile.html").

When not to optimize

Profile before optimizing, optimizing the wrong function is wasted effort. Donald Knuth’s rule applies: “premature optimization is the root of all evil.” A function that takes 100ms in a 10-second pipeline is not the bottleneck. Focus on the functions that consume the most total time, not the ones with the highest per-call time.

For Shiny app profiling, profvis::profvis(shiny::runApp('myapp')) profiles an interactive session. React log (shiny::reactlog_enable(); shiny::reactlogShow()) visualizes the reactive dependency graph and identifies which inputs trigger expensive recalculations.

Conclusion

Profile before optimizing. Use profvis to find bottlenecks, then bench to compare solutions. Focus on the biggest wins first. Most of the time, a single optimization in a hot path matters more than micro-optimizing everything. bench::mark(expr1, expr2) compares expressions across multiple iterations and reports median time, memory, and relative performance.

Summary

Profiling identifies where R code actually spends time, which is rarely where you expect. profvis produces a flame graph that shows call stack and timing side by side with your source code, making bottlenecks immediately visible. The most common findings are: vectorized operations replaced by loops, repeated calls to slow functions inside apply(), and unnecessary copying triggered by growing vectors or data frames incrementally. Fix the identified bottleneck, re-profile to confirm the improvement, and stop when performance is acceptable, premature optimization in R is as wasteful as in any language.

See also