rguides

Writing Good R Functions in R

Functions are the building blocks of reusable R code. Well-written functions make your scripts shorter, easier to test, and simpler to maintain. This guide covers the essential principles of writing good functions in R.

Why write functions?

Repeating code is a maintenance nightmare. When you copy-paste code with slight modifications, you create multiple places that need updating when requirements change. Functions solve this by encapsulating logic in one place.

Consider this repetitive pattern:

# Calculate mean of score column
mean(scores$score)

# Calculate mean of height column  
mean(heights$height)

# Calculate mean of weight column
mean(weights$weight)

A function eliminates this repetition:

calc_mean <- function(data, column) {
  mean(data[[column]])
}

calc_mean(scores, "score")
calc_mean(heights, "height")
calc_mean(weights, "weight")

Functions also reduce bugs. When you fix a bug in a function, it fixes everywhere the function is used. Without functions, you’d need to find and fix every copy of the repeated code.

Function structure

R functions use the function() keyword followed by arguments in parentheses. The body is enclosed in curly braces.

function_name <- function(arg1, arg2 = default) {
  # function body
  result <- arg1 + arg2
  return(result)
}

The return() statement is optional—R returns the last evaluated expression by default. However, explicit returns improve readability, especially in longer functions with multiple exit points.

Naming conventions

Use descriptive names that convey purpose. In R, both snake_case and dot.case are common:

# Snake case (recommended for tidyverse style)
calculate_summary_statistics <- function(data) { }

# Dot case (traditional R style)
calculate.summary.statistics <- function(data) { }

Avoid single-letter names except for well-known mathematical variables like x for a vector or i for an index.

Working with arguments

Function arguments should have sensible defaults when appropriate. This makes the function easier to use for common cases.

normalize <- function(x, center = TRUE, scale = TRUE) {
  if (center) {
    x <- x - mean(x)
  }
  if (scale) {
    x <- x / sd(x)
  }
  x
}

# Use defaults
normalize(c(1, 2, 3, 4, 5))

# Override defaults
normalize(c(1, 2, 3, 4, 5), center = FALSE)

Argument order

Place required arguments before optional ones. This makes it easy to call the function with just the essentials:

# Good: required arg first, optional args after
calculate_rmse <- function(actual, predicted, na.rm = FALSE) { }

# Called with just required args
calculate_rmse(df$actual, df$predicted)

Dot arguments

The special ... argument passes additional arguments to nested functions:

summarize_column <- function(data, col, ...) {
  summary <- summary(data[[col]], ...)
  return(summary)
}

This is useful when you want to give users flexibility without explicitly listing every possible parameter.

Error handling

Reliable functions validate their inputs and provide clear error messages:

divide <- function(a, b) {
  if (!is.numeric(a) || !is.numeric(b)) {
    stop("Both arguments must be numeric")
  }
  if (b == 0) {
    stop("Cannot divide by zero")
  }
  a / b
}

divide(10, 2)
# [1] 5

divide("a", 2)
# Error: Both arguments must be numeric

The stop() function halts execution and displays an error. Use warning() for non-fatal issues and message() for informational output.

Checking types

Use is.* functions to validate input types:

process_vector <- function(x) {
  if (!is.numeric(x)) {
    stop("x must be a numeric vector")
  }
  if (length(x) == 0) {
    warning("x is empty, returning NA")
    return(NA_real_)
  }
  # processing logic
}

Documentation

Comment your functions to explain the what, why, and how:

# Calculate the geometric mean of a numeric vector
# Handles zero and negative values by returning NA
# 
# Args:
#   x: numeric vector
# Returns:
#   numeric value or NA for invalid input
geometric_mean <- function(x) {
  if (any(x <= 0, na.rm = TRUE)) {
    return(NA_real_)
  }
  exp(mean(log(x)))
}

For larger projects, consider using roxygen2 format which generates manual pages automatically.

Scope and environments

Understanding variable scope helps avoid confusing bugs. Variables defined inside a function don’t affect the global environment:

modify_value <- function(x) {
  x <- x * 2
  message("Inside function, x is ", x)
}

x <- 10
modify_value(x)
# Inside function, x is 20
x
# [1] 10

The original x remains unchanged because the function works with a copy.

When to use functions

Create a function when you:

  • Repeat similar code three or more times
  • A logical unit can be named (e.g., calculate_metrics(), validate_input())
  • You need to test individual pieces of logic
  • Code might be reused across projects
  • The operation has configurable parameters

When not to over-Engineer

Avoid creating functions for one-off calculations or when the logic may change significantly between uses. Premature abstraction adds complexity without benefit. If you’re not sure whether code will be reused, write it inline first and refactor into a function when patterns emerge.

Default arguments

Default argument values are evaluated lazily, they are computed when the argument is first used, not when the function is defined. This means default values can reference other arguments: f <- function(x, n = length(x)) sets n to the length of x at call time. It also means complex objects (data frames, lists) as defaults are created fresh on each call, not shared across calls.

Documentation standards

roxygen2 comments directly above the function definition generate both help pages and the NAMESPACE. The minimum set: @param name description for each argument, @return for the return value, @examples for at least one example. Examples must be self-contained and runnable. @seealso links to related functions. @export adds the function to the package namespace. Run devtools::document() after editing roxygen comments to regenerate the .Rd files and NAMESPACE.

Returning values

R functions return the value of the last evaluated expression. return() is explicit and exits immediately, use it for early returns in complex conditional logic. invisible() returns a value without triggering auto-printing: invisible(x) is the idiomatic return for assignment functions like plot() or ggplot() where returning the object enables chaining but printing it would produce unwanted output.

Function design principles

Well-designed functions have one responsibility, clear input/output contracts, and predictable behavior. Name functions as verbs (what they do) and their arguments as nouns (what they operate on). compute_rmse() is better than rmse_thing(). x is a conventional argument name for the primary input.

Default argument values should reflect the most common use case. When a function is called 90% of the time without an argument, the default handles those cases cleanly. Keep the number of arguments small, functions with more than 4-5 mandatory arguments are hard to call correctly. Use ... to pass additional arguments to underlying functions.

NULL is a better default than a sentinel value. filter_by_date(df, from = NULL, to = NULL) with if (!is.null(from)) df <- filter(df, date >= from) is cleaner than inventing a “no filter” sentinel like from = "1900-01-01".

Input validation

Validate inputs early, before expensive computation. stopifnot(is.numeric(x)) stops with a condition describing the failed check. match.arg(method, c("pearson", "spearman")) validates enumerated arguments and returns the matched value (supporting partial matching). rlang::arg_match() is stricter, no partial matching, with a better error message listing valid choices.

For package functions, use rlang::abort() rather than stop() to provide structured error conditions with machine-readable data:

if (!is.numeric(x)) {
  rlang::abort(
    paste0("Expected numeric, got ", class(x)),
    class = "invalid_type",
    expected = "numeric",
    received = class(x)
  )
}

This allows callers to write handlers for specific error types.

Function environments and closures

Each function call creates a new execution environment. Variables created inside a function are local, they do not affect the caller’s environment. <<- assigns in the parent environment, but using it is usually a sign that a better design would pass values explicitly.

Closures capture the environment where they are defined. A function returned from another function has access to the outer function’s variables even after the outer function returns. This enables function factories: functions that return customized functions.

make_adder <- function(n) function(x) x + n
add5 <- make_adder(5)
add5(3)  # 8

environment(f) returns a function’s enclosing environment. environment(f) <- new_env changes it — rarely needed but useful for certain metaprogramming patterns.

Vectorization

R functions should be vectorized: accept a vector and return a vector. ifelse(condition, yes, no) is vectorized (all three arguments can be vectors). if (condition) value1 else value2 is not — it operates on a scalar condition.

When a function contains logic that inherently varies by element, dplyr::case_when() handles vectorized conditional logic: case_when(x < 0 ~ "negative", x == 0 ~ "zero", x > 0 ~ "positive"). For pure R, ifelse() or switch() with vapply().

Vectorize(f) wraps a scalar function to work on vectors using mapply(). It is convenient but not as fast as a truly vectorized implementation. For performance-critical code, rewrite the logic to avoid element-by-element loops.

Documentation with roxygen2

#' comments above a function are processed by roxygen2 into .Rd documentation files. #' @param name Description documents an argument. #' @return Description documents the return value. #' @examples starts an examples block.

#' @inheritParams other_function copies parameter documentation from another function — useful when multiple functions share arguments. #' @seealso [other_function()] adds cross-references.

devtools::document() regenerates all documentation from roxygen2 comments. ?fn_name displays the documentation in the help pane. For interactive development, devtools::load_all() loads functions without reinstalling the package.

Summary

Good functions are small, focused, and well-documented. They validate inputs, provide clear error messages, and have sensible defaults. Apply these principles consistently to write R code that is easier to maintain and share.