rguides

R5 Reference Classes in R

Reference Classes (sometimes called R5, though their official name is Reference Classes) are R’s third object-oriented system. They sit alongside S3 and S4, but behave quite differently. If you have used languages like Java or C#, Reference Classes will feel familiar. If you have only worked with S3 and S4, the difference will surprise you.

The key difference: reference semantics

S3 and S4 objects follow what R programmers call copy-on-modify semantics. When you assign an object to a new variable, you get a complete copy. Modify the copy, and the original stays unchanged.

# S3 object - copy on modify
person <- function(name, age) {
  structure(list(name = name, age = age), class = "person")
}

p1 <- person("Alice", 30)
p2 <- p1
p2$age <- 31
p1$age  # Still 30
p2$age  # 31

Reference Classes break this pattern. They use reference semantics: when you assign a refclass object to a new variable, both variables point to the same underlying object. Modify one, and you modify both.

# Reference Class - same object
Person <- setRefClass("Person", 
  fields = c("name", "age"),
  methods = list(
    greet = function() paste("Hello, I'm", name)
  )
)

p1 <- Person$new(name = "Alice", age = 30)
p2 <- p1
p2$age <- 31
p1$age  # Now 31 - both point to same object
p2$age  # 31

This is the single most important thing to understand about Reference Classes. It affects everything: how you pass objects to functions, how you think about equality, and when refclasses are appropriate.

Defining a reference class

You create refclasses with setRefClass(). Unlike S4’s setClass(), you keep the return value around because it is your generator function.

Person <- setRefClass("Person")
class(Person)
# [1] "RefClass"
# attr(,"package")
# [1] "methods"

The generator has several important methods:

  • $new() - create instances of the class
  • $fields() - list defined fields
  • $methods() - add or modify methods
  • $help() - get help on methods
  • $lock() - lock fields so they can only be set once

Adding fields

Fields hold data. You specify them with the fields argument, either as a character vector of names or as a named list with types:

# Just names - defaults to "ANY"
Person <- setRefClass("Person",
  fields = c("name", "age")
)

# With types
BankAccount <- setRefClass("BankAccount",
  fields = list(
    holder = "character",
    balance = "numeric",
    transactions = "list"
  )
)

Valid field types include: character, numeric, integer, logical, list, environment, and ANY (allows anything).

When you create an instance, pass initial values to $new():

account <- BankAccount$new(
  holder = "Bob",
  balance = 1000,
  transactions = list()
)
account$holder    # "Bob"
account$balance  # 1000

You can also access fields using the $get() and $set() methods:

account$get("balance")      # 1000
account$set("balance", 2000)
account$balance             # 2000

Adding methods

Methods are functions that operate on object fields. You define them in the methods argument to setRefClass():

BankAccount <- setRefClass("BankAccount",
  fields = list(
    holder = "character",
    balance = "numeric",
    transactions = "list"
  ),
  methods = list(
    deposit = function(amount) {
      if (amount <= 0) stop("Deposit must be positive")
      balance <<- balance + amount
      transactions <<- c(transactions, list(deposit = amount))
      invisible(.self)
    },
    withdraw = function(amount) {
      if (amount > balance) stop("Insufficient funds")
      balance <<- balance - amount
      transactions <<- c(transactions, list(withdrawal = amount))
      invisible(.self)
    },
    get_balance = function() balance
  )
)

Notice the <<- assignment operator. This is how methods modify fields. It assigns to the enclosing environment (the object). Using regular <- would just create a local variable.

You can also add methods after class creation using the generator:

Person$methods(
  celebrate_birthday = function() {
    age <<- age + 1
    message("Happy birthday!")
  }
)

Inheritance with contains

Reference Classes support inheritance through the contains argument:

Employee <- setRefClass("Employee",
  contains = "Person",
  fields = c("employee_id", "department"),
  methods = list(
    greet = function() {
      paste("Hi, I'm", name, "from", department)
    }
  )
)

The child class inherits all fields and methods from the parent. You can override methods by redefining them.

You can also inherit from multiple parents:

Manager <- setRefClass("Manager",
  contains = c("Employee", "Person"),
  fields = "team_size"
)

Common methods

All Reference Class objects inherit from envRefClass and get several built-in methods:

account$copy()              # Copy the object
account$field("balance")    # Get field value
account$initFields()       # Re-initialize fields
account$trace("deposit")   # Trace method calls
account$untrace("deposit") # Stop tracing

When to use reference classes

Reference Classes shine in specific situations:

  1. Simulation and modelling - when you are modelling complex state that changes over time, like game state or statistical simulations
  2. GUI programming - when you need objects that persist and mutate in response to user actions
  3. State machines - when you have objects that transition through defined states
  4. Caching and memoization - when you need objects that can update their cached values in place

When not to use reference classes

Most R code should avoid Reference Classes:

  1. Data analysis pipelines - prefer data frames and the tidyverse; functional pipelines are cleaner
  2. Statistical modelling - use S3/S4 for model objects; they fit R’s ecosystem better
  3. Package development - unless you specifically need mutation, S3 is usually the right choice
  4. Parallel computing - refclass objects can cause headaches because of their reference semantics

The majority of your R code should be functional and side-effect free. That is easier to test, reason about, and share with other R programmers. Use Reference Classes only where mutable state is genuinely required.

Limitations

Reference Classes have some constraints:

  • You cannot add fields after creation (that would invalidate existing objects)
  • Field names starting with . are reserved for internal use
  • The enclosing environment is used for the object itself, so you cannot use closures in the usual way
  • Copy semantics can be surprising if you are not expecting them

R5 vs R6 in practice

R5 reference classes are built into base R, no packages required. R6 (from the R6 package) provides a cleaner, faster API with less boilerplate. For new code, R6 is almost always preferable. R5 is mainly encountered when maintaining old code or working with packages that use it internally (some Bioconductor packages use R5).

The core difference in syntax: R5 uses MyClass$new() and self$method() inside method bodies. R6 also uses $new() and self$, making them superficially similar. R5’s setRefClass() requires string-quoted field names and method names, which makes static analysis harder. R6’s R6Class() uses unquoted names and is more compatible with code navigation tools.

When reference semantics are necessary

Most R programming avoids mutable state because functions that modify their inputs are harder to reason about and test. Reference semantics (R5, R6, environments) are appropriate when:

  1. An object represents a shared resource: a database connection, a file handle, a network socket.
  2. The object accumulates state across many operations and copying the full state on each update would be prohibitively expensive.
  3. You are implementing a data structure (tree, graph, cache) where node references must survive operations.

For everything else, standard copy-on-modify semantics with S3 classes or simple lists is preferred.

For most new R code that needs reference semantics, prefer R6 (from the R6 package) over R5 reference classes. R6 has a cleaner API, better performance, and is more widely used in modern packages. R5 knowledge is mainly needed for reading and maintaining legacy code.

When to use R5

R5 reference classes are appropriate when you need true mutable state — objects that change without reassignment — or when implementing stateful abstractions like database connections, caches, or simulation engines. For most data analysis work, S3 or R6 (which has a cleaner syntax) is a better choice. R5 is part of the methods package bundled with base R, so it requires no additional dependencies, which is an advantage in package development where minimizing imports matters.

Summary

Reference Classes give R something it historically lacked: true mutable objects with reference semantics. They behave like objects in mainstream OOP languages, which can be either a benefit or a curse depending on context. The key is recognizing when you actually need mutation - and when you do not.

For most R programming, S3 remains the right tool. But when you are building simulations, GUIs, or stateful systems, Reference Classes are exactly what you need.

See also