R5 Reference Classes in R
Reference Classes (sometimes called R5, though their official name is Reference Classes) are R’s third object-oriented system. They sit alongside S3 and S4, but behave quite differently. If you have used languages like Java or C#, Reference Classes will feel familiar. If you have only worked with S3 and S4, the difference will surprise you.
The key difference: reference semantics
S3 and S4 objects follow what R programmers call copy-on-modify semantics. When you assign an object to a new variable, you get a complete copy. Modify the copy, and the original stays unchanged.
# S3 object - copy on modify
person <- function(name, age) {
structure(list(name = name, age = age), class = "person")
}
p1 <- person("Alice", 30)
p2 <- p1
p2$age <- 31
p1$age # Still 30
p2$age # 31
Reference Classes break this pattern. They use reference semantics: when you assign a refclass object to a new variable, both variables point to the same underlying object. Modify one, and you modify both.
# Reference Class - same object
Person <- setRefClass("Person",
fields = c("name", "age"),
methods = list(
greet = function() paste("Hello, I'm", name)
)
)
p1 <- Person$new(name = "Alice", age = 30)
p2 <- p1
p2$age <- 31
p1$age # Now 31 - both point to same object
p2$age # 31
This is the single most important thing to understand about Reference Classes. It affects everything: how you pass objects to functions, how you think about equality, and when refclasses are appropriate.
Defining a reference class
You create refclasses with setRefClass(). Unlike S4’s setClass(), you keep the return value around because it is your generator function.
Person <- setRefClass("Person")
class(Person)
# [1] "RefClass"
# attr(,"package")
# [1] "methods"
The generator has several important methods:
$new()- create instances of the class$fields()- list defined fields$methods()- add or modify methods$help()- get help on methods$lock()- lock fields so they can only be set once
Adding fields
Fields hold data. You specify them with the fields argument, either as a character vector of names or as a named list with types:
# Just names - defaults to "ANY"
Person <- setRefClass("Person",
fields = c("name", "age")
)
# With types
BankAccount <- setRefClass("BankAccount",
fields = list(
holder = "character",
balance = "numeric",
transactions = "list"
)
)
Valid field types include: character, numeric, integer, logical, list, environment, and ANY (allows anything).
When you create an instance, pass initial values to $new():
account <- BankAccount$new(
holder = "Bob",
balance = 1000,
transactions = list()
)
account$holder # "Bob"
account$balance # 1000
You can also access fields using the $get() and $set() methods:
account$get("balance") # 1000
account$set("balance", 2000)
account$balance # 2000
Adding methods
Methods are functions that operate on object fields. You define them in the methods argument to setRefClass():
BankAccount <- setRefClass("BankAccount",
fields = list(
holder = "character",
balance = "numeric",
transactions = "list"
),
methods = list(
deposit = function(amount) {
if (amount <= 0) stop("Deposit must be positive")
balance <<- balance + amount
transactions <<- c(transactions, list(deposit = amount))
invisible(.self)
},
withdraw = function(amount) {
if (amount > balance) stop("Insufficient funds")
balance <<- balance - amount
transactions <<- c(transactions, list(withdrawal = amount))
invisible(.self)
},
get_balance = function() balance
)
)
Notice the <<- assignment operator. This is how methods modify fields. It assigns to the enclosing environment (the object). Using regular <- would just create a local variable.
You can also add methods after class creation using the generator:
Person$methods(
celebrate_birthday = function() {
age <<- age + 1
message("Happy birthday!")
}
)
Inheritance with contains
Reference Classes support inheritance through the contains argument:
Employee <- setRefClass("Employee",
contains = "Person",
fields = c("employee_id", "department"),
methods = list(
greet = function() {
paste("Hi, I'm", name, "from", department)
}
)
)
The child class inherits all fields and methods from the parent. You can override methods by redefining them.
You can also inherit from multiple parents:
Manager <- setRefClass("Manager",
contains = c("Employee", "Person"),
fields = "team_size"
)
Common methods
All Reference Class objects inherit from envRefClass and get several built-in methods:
account$copy() # Copy the object
account$field("balance") # Get field value
account$initFields() # Re-initialize fields
account$trace("deposit") # Trace method calls
account$untrace("deposit") # Stop tracing
When to use reference classes
Reference Classes shine in specific situations:
- Simulation and modelling - when you are modelling complex state that changes over time, like game state or statistical simulations
- GUI programming - when you need objects that persist and mutate in response to user actions
- State machines - when you have objects that transition through defined states
- Caching and memoization - when you need objects that can update their cached values in place
When not to use reference classes
Most R code should avoid Reference Classes:
- Data analysis pipelines - prefer data frames and the tidyverse; functional pipelines are cleaner
- Statistical modelling - use S3/S4 for model objects; they fit R’s ecosystem better
- Package development - unless you specifically need mutation, S3 is usually the right choice
- Parallel computing - refclass objects can cause headaches because of their reference semantics
The majority of your R code should be functional and side-effect free. That is easier to test, reason about, and share with other R programmers. Use Reference Classes only where mutable state is genuinely required.
Limitations
Reference Classes have some constraints:
- You cannot add fields after creation (that would invalidate existing objects)
- Field names starting with
.are reserved for internal use - The enclosing environment is used for the object itself, so you cannot use closures in the usual way
- Copy semantics can be surprising if you are not expecting them
R5 vs R6 in practice
R5 reference classes are built into base R, no packages required. R6 (from the R6 package) provides a cleaner, faster API with less boilerplate. For new code, R6 is almost always preferable. R5 is mainly encountered when maintaining old code or working with packages that use it internally (some Bioconductor packages use R5).
The core difference in syntax: R5 uses MyClass$new() and self$method() inside method bodies. R6 also uses $new() and self$, making them superficially similar. R5’s setRefClass() requires string-quoted field names and method names, which makes static analysis harder. R6’s R6Class() uses unquoted names and is more compatible with code navigation tools.
When reference semantics are necessary
Most R programming avoids mutable state because functions that modify their inputs are harder to reason about and test. Reference semantics (R5, R6, environments) are appropriate when:
- An object represents a shared resource: a database connection, a file handle, a network socket.
- The object accumulates state across many operations and copying the full state on each update would be prohibitively expensive.
- You are implementing a data structure (tree, graph, cache) where node references must survive operations.
For everything else, standard copy-on-modify semantics with S3 classes or simple lists is preferred.
For most new R code that needs reference semantics, prefer R6 (from the R6 package) over R5 reference classes. R6 has a cleaner API, better performance, and is more widely used in modern packages. R5 knowledge is mainly needed for reading and maintaining legacy code.
When to use R5
R5 reference classes are appropriate when you need true mutable state — objects that change without reassignment — or when implementing stateful abstractions like database connections, caches, or simulation engines. For most data analysis work, S3 or R6 (which has a cleaner syntax) is a better choice. R5 is part of the methods package bundled with base R, so it requires no additional dependencies, which is an advantage in package development where minimizing imports matters.
Summary
Reference Classes give R something it historically lacked: true mutable objects with reference semantics. They behave like objects in mainstream OOP languages, which can be either a benefit or a curse depending on context. The key is recognizing when you actually need mutation - and when you do not.
For most R programming, S3 remains the right tool. But when you are building simulations, GUIs, or stateful systems, Reference Classes are exactly what you need.
See also
- S3 Classes in R — R’s simplest OOP system
- S4 Classes in R — R’s formal OOP system with multiple dispatch