Memory Management in R

· 6 min read · Updated March 11, 2026 · advanced
r memory performance advanced

Memory management in R works differently than in languages like C or Python. R uses copy-on-modify semantics and automatic garbage collection, which simplifies coding but can cause unexpected memory usage patterns. Understanding these internals helps you write code that handles large datasets efficiently.

How R Manages Memory

R allocates memory dynamically and manages it through a garbage collector that runs automatically. When you create objects, R allocates space from the heap. When objects are no longer referenced, the garbage collector reclaims that memory.

Copy-on-Modify

When you modify an object in R, the language doesn’t immediately create a copy. Instead, it uses copy-on-write semantics. The object is only copied when you attempt to modify it, and at that point, R duplicates the data.

x <- 1:1000
y <- x  # No copy made yet - y points to same memory as x

# Modifying y triggers the copy
y[1] <- 0
# Now y has its own copy

This behavior matters in functions. Passing a large vector to a function doesn’t duplicate it unless the function modifies the input.

Garbage Collection

R’s garbage collector runs automatically when the memory heap reaches a threshold. You don’t need to call gc() manually for normal usage. The collector identifies objects with no remaining references and frees their memory.

Use gcinfo(TRUE) to see when garbage collection occurs:

gcinfo(TRUE)
x <- integer(100000)
x <- c(x, 1:18)  # Triggers gc
gcinfo(FALSE)

The output shows memory used by different object types and how much was collected.

Measuring Object Size

R provides several ways to measure memory usage, but the numbers can be confusing.

object.size()

The base R function object.size() returns the memory allocation for an object:

x <- rep(1:10, each = 100)
object.size(x)
# 8320 bytes

This function reports what R allocates, which tends to overestimate because it doesn’t account for shared references.

lobstr::obj_size()

The lobstr package provides a more accurate measurement that accounts for object sharing:

library(lobstr)

x <- 1:1000
y <- list(x, x, x)  # Three references to same object

object.size(y)
# 24568 bytes (counts x three times)

obj_size(y)
# 8120 bytes (recognizes shared memory)

The difference matters when you’re measuring memory usage of complex objects with shared components.

Total Memory Usage

Check R’s total memory consumption with mem_used():

mem_used()
# 1.2 GB of memory allocated

This shows heap memory and cons cells used by R’s internal structures.

Common Memory Pitfalls

Several patterns cause unexpected memory growth.

Growing Vectors in Loops

The most common mistake is growing a vector inside a loop:

# BAD: Creates a new vector on each iteration
result <- numeric(0)
for (i in 1:10000) {
  result <- c(result, i)
}

# GOOD: Pre-allocate the vector
result <- numeric(10000)
for (i in 1:10000) {
  result[i] <- i
}

Each c() call copies the entire vector. With 10,000 iterations, you create thousands of copies unnecessarily.

Unintended Object Retention

Objects remain in memory until explicitly removed or they go out of scope. In scripts and notebooks, old objects accumulate:

ls()  # Shows all objects in your environment
rm(large_object)  # Remove specific object
rm(list = ls())   # Clear everything

The workspace grows over time, consuming memory even when objects aren’t actively used.

Data Frame Copies

Operations that seem to modify data in place often create copies:

df <- data.frame(x = 1:1e6, y = rnorm(1e6))

# This creates a copy
df$z <- df$x + df$y

# Use transform or within, still creates copy
df <- transform(df, z = x + y)

For very large data frames, these intermediate copies double memory usage temporarily.

Reducing Memory Usage

Remove Unneeded Objects

The simplest approach is deleting objects you no longer need:

rm(object1, object2)
gc()  # Force garbage collection to return memory to OS

Call gc() after removing large objects if you need the memory back immediately.

Use Appropriate Data Types

Smaller types use less memory:

# Instead of numeric (8 bytes per element)
x <- 1:1000  # Uses ~8 KB

# Use integer (4 bytes) when appropriate  
y <- as.integer(1:1000)  # Uses ~4 KB

# Use logical (1 byte) instead of character for flags
flags <- rep(TRUE, 1000)  # Uses ~1 KB

Process Data in Chunks

For files larger than available memory, read and process in chunks:

read_chunked <- function(filepath, chunk_size = 10000) {
  con <- file(filepath, "r")
  on.exit(close(con))
  
  repeat {
    lines <- readLines(con, n = chunk_size)
    if (length(lines) == 0) break
    
    # Process chunk
    process(lines)
  }
}

The readr package provides read_csv_chunked() for this pattern.

Use disk.frame or arrow for Large Data

External memory solutions handle datasets larger than RAM:

library(disk.frame)
library(arrow)

# disk.frame keeps data on disk, processes in chunks
df <- as.disk.frame(mtcars, outdir = "df/", nchunks = 2)

# Arrow allows memory-mapped files
tbl <- arrow::open_dataset("parquet_files/")

The gc() Function

The gc() function triggers garbage collection and reports memory statistics. Despite what you might think, calling gc() is rarely necessary for performance:

gc()
#           used (Mb) gc trigger  (Mb) max used (Mb)
# Ncells    450000     6.0    1200000    17.0    676000    8.5
# Vcells   1200000     9.2   1600000    12.2   1300000    9.9

R automatically runs garbage collection when needed. The only reason to call gc() manually is to return memory to the operating system, which matters in long-running processes or when you want accurate memory reporting.

Set gc(reset = TRUE) to reset the memory tracking statistics:

gc(reset = TRUE)
# Now gc() reports from a clean baseline

Memory Profiling Tools

profvis

The profvis package visualizes memory usage during code execution:

library(profvis)

profvis({
  data <- data.frame(x = rnorm(1e6), y = rnorm(1e6))
  data$z <- data$x + data$y
  mean(data$z)
})

The profiling output shows memory allocated and released at each line, helping identify where memory spikes occur.

lobstr::mem_addr()

Get the memory address of an object to verify whether two variables share memory:

library(lobstr)

x <- 1:10
y <- x
mem_addr(x) == mem_addr(y)  # TRUE - same address

y <- c(x)  # Explicit copy
mem_addr(x) == mem_addr(y)  # FALSE - different addresses

tracemem()

The built-in tracemem() function prints messages when an object is copied:

x <- 1:1e6
tracemem(x)

x[1] <- 0  # Prints: [1] "<0x...> is copied"

This helps identify exactly where copy-on-modify triggers in your code.

Working with Large Datasets Efficiently

Avoid Loading Entire Files

Use column selection and filtering in database queries or with packages like disk.frame:

library(dplyr)

# Only select needed columns, filter early
result <- db %>%
  select(id, value, date) %>%
  filter(date > "2024-01-01") %>%
  collect()

Clone Less

Every object you create uses memory. Reuse existing objects when possible:

# Instead of creating multiple transformed versions
df1 <- transform(df, log_x = log(x))
df2 <- transform(df1, sqrt_y = sqrt(y))

# Transform once
df <- transform(df, log_x = log(x), sqrt_y = sqrt(y))

Monitor Memory Limits

Check and adjust memory limits with memory.limit():

memory.limit()  # Get current limit in MB

memory.limit(8000)  # Increase to 8 GB on Windows

Linux and macOS don’t enforce hard memory limits the same way Windows does.

Summary

R’s memory management is automatic but not invisible. The copy-on-modify behavior means passing objects to functions is cheap until you modify them. Use lobstr::obj_size() for accurate measurements, avoid growing vectors in loops, and remove objects you no longer need. For truly large data, consider external memory solutions like disk.frame or Arrow. The gc() function is rarely needed for performance but helps return memory to the OS in long-running processes.