S4 Classes in R
S4 is R’s formal object-oriented system. Unlike S3, which is informal and based on attributes, S4 provides a rigorous framework with formal class definitions, multiple dispatch, and built-in validation. If you’ve built packages for others to use or need complex data structures with strict contracts, S4 is the right tool.
What S4 Gives You
S4 solves problems that S3 cannot handle well. It gives you multiple dispatch, formal class definitions with slots, automatic validation, and inheritance through a proper class hierarchy. The downside is more boilerplate code.
When should you use S4? When you need multiple dispatch across several arguments. When you want enforced type checking on your objects. When you are building a package that others will depend on and you need clear contracts. For quick data analysis scripts, S3 is usually sufficient.
Defining an S4 Class
You define S4 classes with setClass(). Unlike S3, you specify exactly what data each object holds using named slots.
# Define a class with two slots
setClass("Person",
slots = c(
name = "character",
age = "numeric"
))
# Create an instance
alice <- new("Person", name = "Alice", age = 30)
# Access slots with @, not $
alice@name
## [1] "Alice"
alice@age
## [1] 30
The slot definition includes the expected class. R will enforce this when you create objects. If you try to pass a character to a numeric slot, you will get an error.
You can also define classes with representation for more control:
setClass("Employee",
representation = list(
name = "character",
salary = "numeric",
department = "character"
))
Creating Generics and Methods
S4 uses generics to define an interface. A generic is a function that dispatches to specific methods based on the class of its arguments. You create generics with setGeneric().
# Create a generic for describing a person
setGeneric("describe", function(object) {
standardGeneric("describe")
})
# Define a method for the Person class
setMethod("describe",
"Person",
function(object) {
paste(object@name, "is", object@age, "years old")
})
# Use it
describe(alice)
## [1] "Alice is 30 years old"
The signature argument in setMethod() specifies which class the method handles. For single-argument dispatch, the signature is just the class name.
Multiple Dispatch
S4’s real power shows with multiple dispatch. The method selected depends on all its arguments.
setClass("Project",
slots = c(
title = "character",
budget = "numeric"
))
# Generic with two arguments
setGeneric("assign",
function(person, project) {
standardGeneric("assign")
})
setMethod("assign",
signature(person = "Person", project = "Project"),
function(person, project) {
paste("Assigning", person@name, "to", project@title)
})
setMethod("assign",
signature(person = "Employee", project = "Project"),
function(person, project) {
paste("Assigning employee", person@name,
"to", project@title, "with budget", project@budget)
})
# Create objects
proj <- new("Project", title = "Website Redesign", budget = 50000)
emp <- new("Employee", name = "Bob", salary = 75000, department = "Engineering")
# Different methods called based on argument classes
assign(alice, proj)
## [1] "Assigning Alice to Website Redesign"
assign(emp, proj)
## [1] "Assigning employee Bob to Website Redesign with budget 50000"
This is the key difference from S3. You can have different behaviors depending on the class of multiple arguments.
Object Validation
S4 lets you define validation methods that run automatically when objects are created. You define them with setValidity().
setClass("Account",
slots = c(
balance = "numeric",
owner = "character"
),
validity = function(object) {
errors <- character()
if (object@balance < 0) {
errors <- c(errors, "Balance cannot be negative")
}
if (nchar(object@owner) == 0) {
errors <- c(errors, "Owner name cannot be empty")
}
if (length(errors) == 0) TRUE else errors
})
# This works
account <- new("Account", balance = 1000, owner = "Alice")
# This fails validation
bad_account <- new("Account", balance = -50, owner = "Bob")
## Error in validity method for "Account" :
## Balance cannot be negative
The validity method receives the object before it is finalized. It returns TRUE if valid, or a character vector of error messages. This catches problems early, at object creation time.
Inheritance with contains
S4 supports inheritance through the contains argument in setClass(). A class that contains another inherits its slots and can override its methods.
# Define a base class
setClass("Vehicle",
slots = c(
make = "character",
model = "character"
))
# Define a subclass
setClass("Car",
contains = "Vehicle",
slots = c(
doors = "numeric",
drivetrain = "character"
))
# Create objects
base_vehicle <- new("Vehicle", make = "Toyota", model = "Camry")
my_car <- new("Car", make = "Honda", model = "Civic", doors = 4, drivetrain = "FWD")
# Check inheritance
is(base_vehicle, "Vehicle")
## [1] TRUE
is(my_car, "Car")
## [1] TRUE
is(my_car, "Vehicle")
## [1] TRUE
Methods defined on the parent class work on child objects too. Define a method for Vehicle, and Car objects will use it unless you override it.
setGeneric("getMakeModel", function(x) standardGeneric("getMakeModel"))
setMethod("getMakeModel", "Vehicle",
function(x) paste(x@make, x@model))
getMakeModel(base_vehicle)
## [1] "Toyota Camry"
getMakeModel(my_car)
## [1] "Honda Civic"
Multiple Inheritance
S4 supports multiple inheritance, though it gets complicated fast. Use the contains argument with a character vector:
setClass("ElectricVehicle",
slots = c(battery_kwh = "numeric"))
setClass("ElectricCar",
contains = c("Car", "ElectricVehicle"),
slots = c(charge_level = "numeric"))
Multiple inheritance means the class graph is not a simple tree. Method dispatch follows complex rules. Only use it when you have to.
Practical Example: Data Frame Wrapper
Here is a more realistic example that shows why you would actually use S4 in production code.
# Define a validated data wrapper
setClass("AnalysisResult",
slots = c(
data = "data.frame",
test_name = "character",
p_value = "numeric"
),
validity = function(object) {
errors <- character()
if (!is.data.frame(object@data)) {
errors <- c(errors, "data must be a data.frame")
}
if (object@p_value < 0 || object@p_value > 1) {
errors <- c(errors, "p_value must be between 0 and 1")
}
if (length(errors) == 0) TRUE else errors
})
# Define print behavior
setMethod("show", "AnalysisResult",
function(object) {
cat("AnalysisResult:", object@test_name, "\n")
cat(" p-value:", object@p_value, "\n")
cat(" rows:", nrow(object@data), "\n")
})
# Create a result
result <- new("AnalysisResult",
data = data.frame(x = 1:10, y = rnorm(10)),
test_name = "t-test",
p_value = 0.032)
print(result)
## AnalysisResult: t-test
## p-value: 0.032
## rows: 10
The validation ensures no invalid results slip through. The show method makes console output clean. This pattern is common in biostatistics packages.
Summary
S4 gives you formal object-oriented programming in R. Use setClass() to define classes with typed slots. Use setGeneric() to create dispatchable functions and setMethod() to implement them. Validation methods catch invalid states at object creation. Inheritance through contains lets you build class hierarchies.
The trade-off is more code upfront versus runtime safety and explicit contracts. For packages and complex systems, that trade-off usually pays off.
See Also
- S3 Classes in R — R’s simpler OOP system, good for quick data transformations
- R6 Classes — Reference classes with mutable state (when available)
- Building R Packages — Package development including OOP systems