Writing Good R Functions in R
Functions are the building blocks of reusable R code. Well-written functions make your scripts shorter, easier to test, and simpler to maintain. This guide covers the essential principles of writing good functions in R.
Why Write Functions?
Repeating code is a maintenance nightmare. When you copy-paste code with slight modifications, you create multiple places that need updating when requirements change. Functions solve this by encapsulating logic in one place.
Consider this repetitive pattern:
# Calculate mean of score column
mean(scores$score)
# Calculate mean of height column
mean(heights$height)
# Calculate mean of weight column
mean(weights$weight)
A function eliminates this repetition:
calc_mean <- function(data, column) {
mean(data[[column]])
}
calc_mean(scores, "score")
calc_mean(heights, "height")
calc_mean(weights, "weight")
Functions also reduce bugs. When you fix a bug in a function, it fixes everywhere the function is used. Without functions, you’d need to find and fix every copy of the repeated code.
Function Structure
R functions use the function() keyword followed by arguments in parentheses. The body is enclosed in curly braces.
function_name <- function(arg1, arg2 = default) {
# function body
result <- arg1 + arg2
return(result)
}
The return() statement is optional—R returns the last evaluated expression by default. However, explicit returns improve readability, especially in longer functions with multiple exit points.
Naming Conventions
Use descriptive names that convey purpose. In R, both snake_case and dot.case are common:
# Snake case (recommended for tidyverse style)
calculate_summary_statistics <- function(data) { }
# Dot case (traditional R style)
calculate.summary.statistics <- function(data) { }
Avoid single-letter names except for well-known mathematical variables like x for a vector or i for an index.
Working with Arguments
Function arguments should have sensible defaults when appropriate. This makes the function easier to use for common cases.
normalize <- function(x, center = TRUE, scale = TRUE) {
if (center) {
x <- x - mean(x)
}
if (scale) {
x <- x / sd(x)
}
x
}
# Use defaults
normalize(c(1, 2, 3, 4, 5))
# Override defaults
normalize(c(1, 2, 3, 4, 5), center = FALSE)
Argument Order
Place required arguments before optional ones. This makes it easy to call the function with just the essentials:
# Good: required arg first, optional args after
calculate_rmse <- function(actual, predicted, na.rm = FALSE) { }
# Called with just required args
calculate_rmse(df$actual, df$predicted)
Dot Arguments
The special ... argument passes additional arguments to nested functions:
summarize_column <- function(data, col, ...) {
summary <- summary(data[[col]], ...)
return(summary)
}
This is useful when you want to give users flexibility without explicitly listing every possible parameter.
Error Handling
Robust functions validate their inputs and provide clear error messages:
divide <- function(a, b) {
if (!is.numeric(a) || !is.numeric(b)) {
stop("Both arguments must be numeric")
}
if (b == 0) {
stop("Cannot divide by zero")
}
a / b
}
divide(10, 2)
# [1] 5
divide("a", 2)
# Error: Both arguments must be numeric
The stop() function halts execution and displays an error. Use warning() for non-fatal issues and message() for informational output.
Checking Types
Use is.* functions to validate input types:
process_vector <- function(x) {
if (!is.numeric(x)) {
stop("x must be a numeric vector")
}
if (length(x) == 0) {
warning("x is empty, returning NA")
return(NA_real_)
}
# processing logic
}
Documentation
Comment your functions to explain the what, why, and how:
# Calculate the geometric mean of a numeric vector
# Handles zero and negative values by returning NA
#
# Args:
# x: numeric vector
# Returns:
# numeric value or NA for invalid input
geometric_mean <- function(x) {
if (any(x <= 0, na.rm = TRUE)) {
return(NA_real_)
}
exp(mean(log(x)))
}
For larger projects, consider using roxygen2 format which generates manual pages automatically.
Scope and Environments
Understanding variable scope helps avoid confusing bugs. Variables defined inside a function don’t affect the global environment:
modify_value <- function(x) {
x <- x * 2
message("Inside function, x is ", x)
}
x <- 10
modify_value(x)
# Inside function, x is 20
x
# [1] 10
The original x remains unchanged because the function works with a copy.
When to Use Functions
Create a function when you:
- Repeat similar code three or more times
- A logical unit can be named (e.g.,
calculate_metrics(),validate_input()) - You need to test individual pieces of logic
- Code might be reused across projects
- The operation has configurable parameters
When Not to Over-Engineer
Avoid creating functions for one-off calculations or when the logic may change significantly between uses. Premature abstraction adds complexity without benefit. If you’re not sure whether code will be reused, write it inline first and refactor into a function when patterns emerge.
Summary
Good functions are small, focused, and well-documented. They validate inputs, provide clear error messages, and have sensible defaults. Apply these principles consistently to write R code that is easier to maintain and share.