Writing Good R Functions in R

· 4 min read · Updated March 10, 2026 · beginner
functions r best-practices programming

Functions are the building blocks of reusable R code. Well-written functions make your scripts shorter, easier to test, and simpler to maintain. This guide covers the essential principles of writing good functions in R.

Why Write Functions?

Repeating code is a maintenance nightmare. When you copy-paste code with slight modifications, you create multiple places that need updating when requirements change. Functions solve this by encapsulating logic in one place.

Consider this repetitive pattern:

# Calculate mean of score column
mean(scores$score)

# Calculate mean of height column  
mean(heights$height)

# Calculate mean of weight column
mean(weights$weight)

A function eliminates this repetition:

calc_mean <- function(data, column) {
  mean(data[[column]])
}

calc_mean(scores, "score")
calc_mean(heights, "height")
calc_mean(weights, "weight")

Functions also reduce bugs. When you fix a bug in a function, it fixes everywhere the function is used. Without functions, you’d need to find and fix every copy of the repeated code.

Function Structure

R functions use the function() keyword followed by arguments in parentheses. The body is enclosed in curly braces.

function_name <- function(arg1, arg2 = default) {
  # function body
  result <- arg1 + arg2
  return(result)
}

The return() statement is optional—R returns the last evaluated expression by default. However, explicit returns improve readability, especially in longer functions with multiple exit points.

Naming Conventions

Use descriptive names that convey purpose. In R, both snake_case and dot.case are common:

# Snake case (recommended for tidyverse style)
calculate_summary_statistics <- function(data) { }

# Dot case (traditional R style)
calculate.summary.statistics <- function(data) { }

Avoid single-letter names except for well-known mathematical variables like x for a vector or i for an index.

Working with Arguments

Function arguments should have sensible defaults when appropriate. This makes the function easier to use for common cases.

normalize <- function(x, center = TRUE, scale = TRUE) {
  if (center) {
    x <- x - mean(x)
  }
  if (scale) {
    x <- x / sd(x)
  }
  x
}

# Use defaults
normalize(c(1, 2, 3, 4, 5))

# Override defaults
normalize(c(1, 2, 3, 4, 5), center = FALSE)

Argument Order

Place required arguments before optional ones. This makes it easy to call the function with just the essentials:

# Good: required arg first, optional args after
calculate_rmse <- function(actual, predicted, na.rm = FALSE) { }

# Called with just required args
calculate_rmse(df$actual, df$predicted)

Dot Arguments

The special ... argument passes additional arguments to nested functions:

summarize_column <- function(data, col, ...) {
  summary <- summary(data[[col]], ...)
  return(summary)
}

This is useful when you want to give users flexibility without explicitly listing every possible parameter.

Error Handling

Robust functions validate their inputs and provide clear error messages:

divide <- function(a, b) {
  if (!is.numeric(a) || !is.numeric(b)) {
    stop("Both arguments must be numeric")
  }
  if (b == 0) {
    stop("Cannot divide by zero")
  }
  a / b
}

divide(10, 2)
# [1] 5

divide("a", 2)
# Error: Both arguments must be numeric

The stop() function halts execution and displays an error. Use warning() for non-fatal issues and message() for informational output.

Checking Types

Use is.* functions to validate input types:

process_vector <- function(x) {
  if (!is.numeric(x)) {
    stop("x must be a numeric vector")
  }
  if (length(x) == 0) {
    warning("x is empty, returning NA")
    return(NA_real_)
  }
  # processing logic
}

Documentation

Comment your functions to explain the what, why, and how:

# Calculate the geometric mean of a numeric vector
# Handles zero and negative values by returning NA
# 
# Args:
#   x: numeric vector
# Returns:
#   numeric value or NA for invalid input
geometric_mean <- function(x) {
  if (any(x <= 0, na.rm = TRUE)) {
    return(NA_real_)
  }
  exp(mean(log(x)))
}

For larger projects, consider using roxygen2 format which generates manual pages automatically.

Scope and Environments

Understanding variable scope helps avoid confusing bugs. Variables defined inside a function don’t affect the global environment:

modify_value <- function(x) {
  x <- x * 2
  message("Inside function, x is ", x)
}

x <- 10
modify_value(x)
# Inside function, x is 20
x
# [1] 10

The original x remains unchanged because the function works with a copy.

When to Use Functions

Create a function when you:

  • Repeat similar code three or more times
  • A logical unit can be named (e.g., calculate_metrics(), validate_input())
  • You need to test individual pieces of logic
  • Code might be reused across projects
  • The operation has configurable parameters

When Not to Over-Engineer

Avoid creating functions for one-off calculations or when the logic may change significantly between uses. Premature abstraction adds complexity without benefit. If you’re not sure whether code will be reused, write it inline first and refactor into a function when patterns emerge.

Summary

Good functions are small, focused, and well-documented. They validate inputs, provide clear error messages, and have sensible defaults. Apply these principles consistently to write R code that is easier to maintain and share.