rguides

S4 Classes in R: Formal OOP with setClass and Methods

S4 classes form R’s formal object-oriented system. Unlike S3, which is informal and based on attributes, S4 provides a rigorous framework with formal class definitions, multiple dispatch, and built-in validation. If you’ve built packages for others to use or need complex data structures with strict contracts, S4 classes are the right tool.

What S4 gives you

S4 solves problems that S3 cannot handle well. It gives you multiple dispatch, formal class definitions with slots, automatic validation, and inheritance through a proper class hierarchy. The downside is more boilerplate code.

When should you use S4? When you need multiple dispatch across several arguments. When you want enforced type checking on your objects. When you are building a package that others will depend on and you need clear contracts. For quick data analysis scripts, S3 is usually sufficient.

Defining an S4 class

You define S4 classes with setClass(). Unlike S3, you specify exactly what data each object holds using named slots.

# Define a class with two slots
setClass("Person",
         slots = c(
           name = "character",
           age = "numeric"
         ))

# Create an instance
alice <- new("Person", name = "Alice", age = 30)

# Access slots with @, not $
alice@name
## [1] "Alice"

alice@age
## [1] 30

The slot definition includes the expected class. R will enforce this when you create objects. If you try to pass a character to a numeric slot, you will get an error. When you have many slots or need to document them individually, the representation argument offers more control:

You can also define classes with representation for more control:

setClass("Employee",
         representation = list(
           name = "character",
           salary = "numeric",
           department = "character"
         ))

Creating generics and methods

Defining a class gives you a data structure, but you need generics and methods to define behavior. S4 uses generics to define an interface. A generic is a function that dispatches to specific methods based on the class of its arguments. You create generics with setGeneric() and attach implementations with setMethod(), which keeps the interface separate from each class’s specific logic.

# Create a generic for describing a person
setGeneric("describe", function(object) {
  standardGeneric("describe")
})

# Define a method for the Person class
setMethod("describe",
          "Person",
          function(object) {
            paste(object@name, "is", object@age, "years old")
          })

# Use it
describe(alice)
## [1] "Alice is 30 years old"

The signature argument in setMethod() specifies which class the method handles. For single-argument dispatch, the signature is just the class name and the method is selected based on that one argument’s type.

Multiple dispatch

S4’s real power shows with multiple dispatch. Rather than dispatching on just the first argument like S3, the method selected depends on the classes of all its arguments simultaneously.

setClass("Project",
         slots = c(
           title = "character",
           budget = "numeric"
         ))

# Generic with two arguments
setGeneric("assign",
          function(person, project) {
            standardGeneric("assign")
          })

setMethod("assign",
          signature(person = "Person", project = "Project"),
          function(person, project) {
            paste("Assigning", person@name, "to", project@title)
          })

setMethod("assign",
          signature(person = "Employee", project = "Project"),
          function(person, project) {
            paste("Assigning employee", person@name, 
                  "to", project@title, "with budget", project@budget)
          })

# Create objects
proj <- new("Project", title = "Website Redesign", budget = 50000)
emp <- new("Employee", name = "Bob", salary = 75000, department = "Engineering")

# Different methods called based on argument classes
assign(alice, proj)
## [1] "Assigning Alice to Website Redesign"

assign(emp, proj)
## [1] "Assigning employee Bob to Website Redesign with budget 50000"

This is the key difference from S3. You can have different behaviors depending on the class of multiple arguments.

Object validation

S4 lets you define validation methods that run automatically when objects are created. You define them with setValidity().

setClass("Account",
         slots = c(
           balance = "numeric",
           owner = "character"
         ),
         validity = function(object) {
           errors <- character()
           
           if (object@balance < 0) {
             errors <- c(errors, "Balance cannot be negative")
           }
           
           if (nchar(object@owner) == 0) {
             errors <- c(errors, "Owner name cannot be empty")
           }
           
           if (length(errors) == 0) TRUE else errors
         })

# This works
account <- new("Account", balance = 1000, owner = "Alice")

# This fails validation
bad_account <- new("Account", balance = -50, owner = "Bob")
## Error in validity method for "Account" : 
##   Balance cannot be negative

The validity method receives the object before it is finalized. It returns TRUE if valid, or a character vector of error messages. This catches problems early, at object creation time.

Inheritance with contains

S4 supports inheritance through the contains argument in setClass(). A class that contains another inherits its slots and can override its methods.

# Define a base class
setClass("Vehicle",
         slots = c(
           make = "character",
           model = "character"
         ))

# Define a subclass
setClass("Car",
         contains = "Vehicle",
         slots = c(
           doors = "numeric",
           drivetrain = "character"
         ))

# Create objects
base_vehicle <- new("Vehicle", make = "Toyota", model = "Camry")
my_car <- new("Car", make = "Honda", model = "Civic", doors = 4, drivetrain = "FWD")

# Check inheritance
is(base_vehicle, "Vehicle")
## [1] TRUE

Methods defined on the parent class work on child objects too. Define a method for Vehicle, and Car objects will use it unless you override it.

setGeneric("getMakeModel", function(x) standardGeneric("getMakeModel"))

setMethod("getMakeModel", "Vehicle",
          function(x) paste(x@make, x@model))

getMakeModel(base_vehicle)
## [1] "Toyota Camry"

getMakeModel(my_car)
## [1] "Honda Civic"

Multiple inheritance

When a single class needs to combine behavior from more than one parent, S4 supports multiple inheritance through a vector of class names. This gets complicated fast because method dispatch must resolve conflicting implementations from different branches of the hierarchy. Use the contains argument with a character vector, but only when single inheritance cannot model your domain:

setClass("ElectricVehicle",
         slots = c(battery_kwh = "numeric"))

setClass("ElectricCar",
         contains = c("Car", "ElectricVehicle"),
         slots = c(charge_level = "numeric"))

Multiple inheritance means the class graph is not a simple tree. Method dispatch follows complex rules when a method could be inherited from multiple ancestors simultaneously. Only use it when you have to and keep the hierarchy shallow.

Practical example: data frame wrapper

Here is a more realistic example that shows why you would actually use S4 in production code. This pattern combines class definition, slot validation, and a custom show method into a single, self-contained unit that prevents invalid analysis results from propagating through your pipeline.

# Define a validated data wrapper
setClass("AnalysisResult",
         slots = c(
           data = "data.frame",
           test_name = "character",
           p_value = "numeric"
         ),
         validity = function(object) {
           errors <- character()
           
           if (!is.data.frame(object@data)) {
             errors <- c(errors, "data must be a data.frame")
           }
           
           if (object@p_value < 0 || object@p_value > 1) {
             errors <- c(errors, "p_value must be between 0 and 1")
           }
           
           if (length(errors) == 0) TRUE else errors
         })

# Define print behavior
setMethod("show", "AnalysisResult",
          function(object) {
            cat("AnalysisResult:", object@test_name, "\n")
            cat("  p-value:", object@p_value, "\n")
            cat("  rows:", nrow(object@data), "\n")
          })

# Create a result
result <- new("AnalysisResult",
              data = data.frame(x = 1:10, y = rnorm(10)),
              test_name = "t-test",
              p_value = 0.032)

print(result)
## AnalysisResult: t-test
##   p-value: 0.032
##   rows: 10

The validation ensures no invalid results slip through. The show method makes console output clean. This pattern is common in biostatistics packages.

S4 generics and methods

S4 uses formal generic functions defined with setGeneric() and method implementations defined with setMethod(). setGeneric("area", function(shape) standardGeneric("area")) defines the generic. setMethod("area", "Circle", function(shape) pi * shape@radius^2) implements it for Circle objects. Calling area(my_circle) dispatches to the correct implementation based on the class.

S4 supports multiple dispatch, methods can be defined based on the class of more than one argument. setMethod("combine", signature("Matrix", "Matrix"), function(a, b) ...) dispatches based on both arguments’ classes. This is more powerful than S3 dispatch, which only dispatches on the first argument.

Validity checking

setValidity("MyClass", function(object) { if (object@n < 0) "n must be non-negative" else TRUE }) defines a validation function. validObject(my_obj) triggers validation manually. The initialize method should call callNextMethod() and then validObject(new_object) to ensure the object is valid at construction time. By default, new() also runs validity checks automatically; pass check = FALSE to skip them.

When to use S4

S4 is the standard for Bioconductor packages and statistical software where formal interfaces and multiple dispatch are needed. For most data science and data engineering packages, S3 is sufficient and simpler. Use S4 when you need multiple dispatch, formal class hierarchies with validity checking, or when contributing to an ecosystem that uses S4 (Bioconductor).

S4 in Bioconductor

Bioconductor packages use S4 extensively for genomics data structures like SummarizedExperiment, GRanges, and SingleCellExperiment. If you work with Bioconductor, understanding S4 is essential, the accessor methods, slot structure, and inheritance hierarchies follow S4 conventions throughout the ecosystem. isVirtualClass() checks if a class is abstract; extends() shows the class hierarchy.

When S4 is appropriate

S4 is R’s formal object system. Unlike S3, S4 requires explicit class definitions with typed slots, explicit validity checking, and explicit method signatures. This formality adds overhead but provides guarantees: S4 objects are validated at creation time, methods dispatch on multiple arguments rather than just the first, and introspection tools can enumerate all methods for a class. Bioconductor mandates S4 for all packages, which is why the genomics ecosystem is almost entirely S4.

Choose S4 when you need multiple dispatch, methods that behave differently based on the types of two or more arguments simultaneously. Matrix operations where the behavior depends on whether both operands are dense matrices, sparse matrices, or one of each are a natural fit. S3 dispatches on the class of one argument; implementing this in S3 requires explicit type checking inside methods, which is what S4’s formal dispatch replaces.

Class definitions and slots

Define an S4 class with setClass, specifying slot names and their types. The types are R class names — numeric, character, logical, or the names of other S4 classes. Slots are accessed with the at-sign operator, which is analogous to the dollar-sign operator for lists but enforces the declared type.

The validity argument to setClass takes a function that checks whether a newly created object satisfies the class’s invariants. This function receives the object and returns TRUE for valid objects or a character string describing the problem for invalid ones. Validity checking runs at construction time and can be run explicitly with validObject. Well-written validity functions catch problems early with messages that identify what constraint was violated, rather than letting invalid objects cause confusing errors later.

Generics and methods

S4 generics are defined with setGeneric, which creates a placeholder that dispatches to methods defined with setMethod. The signature argument to setMethod specifies which argument classes trigger this particular method. For single dispatch, the signature names one argument. For multiple dispatch, it names two or more, and the method is called when all named arguments match their declared classes.

Documentation for S4 classes and methods in roxygen2 uses the same tags as S3 but exports are specified differently. Export the class with exportClasses and methods with exportMethods in the NAMESPACE file. roxygen2 generates these automatically with the appropriate tags. S4 method documentation groups all methods for a generic on one man page using the @rdname and @aliases tags.

Inheritance

S4 classes inherit slots and methods from parent classes through the contains argument to setClass. A subclass has all the slots of its parent plus any additional slots. Methods defined for the parent class are available to the subclass unless overridden. The callNextMethod function inside a method calls the parent class implementation, enabling cooperative method chaining in the same way that super calls work in other object systems.

Summary

S4 gives you formal object-oriented programming in R. Use setClass() to define classes with typed slots. Use setGeneric() to create dispatchable functions and setMethod() to implement them. Validation methods catch invalid states at object creation. Inheritance through contains lets you build class hierarchies.

The trade-off is more code upfront versus runtime safety and explicit contracts. For packages and complex systems, that trade-off usually pays off.

See also