rguides

Functional Programming in R

Introduction

Functional programming in R isn’t a niche trick—it’s how the language was designed to work. R treats functions as first-class citizens, which means you can pass them around, return them, and store them in data structures just like any other value. This opens up a different way of thinking about problems: decompose your task into transformations, then apply them systematically.

First-Class functions

In R, functions are objects you can assign to variables, put in lists, and pass as arguments to other functions. This is the foundation everything else builds on. Unlike languages that separate “functions” from “data,” R blurs that line deliberately.

# Assign a function to a variable
square <- function(x) x^2

# Pass a function to another function
result <- sapply(1:5, square)
# [1]  1  4  9 16 25

You can also store multiple functions in a list for dynamic selection:

operations <- list(
  double = function(x) x * 2,
  triple = function(x) x * 3,
  square = function(x) x^2
)

choose_op <- function(name, x) {
  operations[[name]](x)
}

choose_op("double", 5)
# [1] 10

This pattern shows up constantly in R programming. You’re not just calling functions—you’re composing behavior. Anonymous functions (those without names) work the same way and are common in data transformation pipelines.

Higher-Order functions

A higher-order function is simply a function that takes other functions as input or returns them as output. R’s apply family is the most common example, but you can build your own. Understanding this concept lets you abstract away repetition and write more reusable code.

# A function that takes a function and applies it twice
apply_twice <- function(f, x) {
  f(f(x))
}

apply_twice(function(x) x + 1, 0)
# [1] 2

The lapply and sapply functions iterate over lists and vectors, applying a function to each element. This replaces explicit loops for many tasks:

data <- list(c(1,2,3), c(4,5,6), c(7,8,9))
means <- lapply(data, mean)
# [[1]]
# [1] 2
# [[2]]
# [1] 5
# [[3]]
# [1] 8

The key advantage is that you don’t manage iteration state yourself—the function handles it. This reduces bugs and makes your code easier to reason about.

Closures

A closure is a function that captures variables from its creation environment. This sounds abstract but is incredibly useful for creating functions with persistent state. The captured variables live as long as the function itself, allowing you to maintain state without resorting to global variables.

# Create a counter function
make_counter <- function() {
  count <- 0
  function() {
    count <<- count + 1
    count
  }
}

counter1 <- make_counter()
counter2 <- make_counter()

counter1()  # [1] 1
counter1()  # [1] 2
counter2()  # [1] 1
counter2()  # [1] 2

Each call to make_counter() creates a fresh environment with its own count variable. The inner function “closes over” that environment, preserving it between calls. This is the same mechanism used in R’s formula notation and in packages like dplyr for creating scoped variants of functions. When you call filter(df, across(everything(), ~ .x > 0)), the .x is bound through closure semantics.

Function factories

A function factory is a function that returns other functions. Use them when you need many similar functions that differ only in some parameter.

# Create a power function with a fixed exponent
power <- function(exp) {
  function(base) {
    base^exp
  }
}

square <- power(2)
cube <- power(3)

square(5)  # [1] 25
cube(5)   # [1] 125

Factories are useful for creating families of related functions:

# Create scalers for different transformations
scale_by <- function(factor) {
  function(x) x * factor
}

scale_10 <- scale_by(10)
scale_100 <- scale_by(100)

scale_10(c(1, 2, 3))  # [1] 10 20 30

Practical examples

Combining these concepts lets you write concise, expressive code:

# Example 1: Using closures with lapply
make_multipliers <- function(vals) {
  lapply(vals, function(x) function(y) x * y)
}

mults <- make_multipliers(c(2, 3, 4))
mults[[2]](10)  # [1] 30 (10 * 3)

# Example 2: Custom higher-order function
filter_by <- function(data, condition_fn) {
  data[sapply(data, condition_fn)]
}

numbers <- list(1, 5, 10, 15, 20, 25)
is_even <- function(x) x %% 2 == 0
filter_by(numbers, is_even)
# [[1]]
# [1] 10
# [[2]]
# [1] 20

These patterns appear throughout R’s ecosystem. The tidyverse builds heavily on them—purrr’s functional programming tools, dplyr’s scoped functions, and ggplot2’s layer specifications all use closures and function factories under the hood.

Composing functions

purrr::compose() creates a pipeline of functions applied right-to-left: f <- compose(sqrt, abs) creates a function that applies abs() then sqrt(). Magrittr’s %>% and R’s native |> pipe are the more common alternative, but compose() is useful when the combined function itself needs to be passed as an argument or stored. partial() creates partially-applied functions: add5 <- partial("+", 5) creates a function that adds 5 to its argument.

Avoiding side effects

Side effects in R include modifying global state, printing output, writing files, and modifying objects in place. Functions that print (cat(), message()) or modify global state (<<-, options(), setwd()) are harder to test and reuse. Separate pure computation from side effects: compute the result, return it, and let the caller decide what to do with it. This separation makes unit testing straightforward, no test setup required to isolate the logic.

Pure functions

A pure function has no side effects and always returns the same output for the same input. Side effects include: modifying global state, writing to files, printing to console, accessing random number generators, reading system time. Pure functions are easier to test (no setup/teardown), easier to parallelize (no shared state), and easier to reason about.

Most data analysis involves impure operations at the boundaries (reading files, querying databases) with pure transformations in the middle. The functional programming principle is to keep impure operations small and isolated, and make the transformation logic pure.

purr::map(), filter(), reduce() are pure higher-order functions, they do not modify their inputs and return new values. dplyr::mutate(), filter(), and summarise() follow the same principle: they return new data frames without modifying the input.

Function composition

Composing functions means building complex operations from simple ones. compose(f, g) creates a function where the output of g feeds into f. purrr::compose(toupper, trimws) creates a function that trims whitespace then converts to uppercase.

The pipe (|> or %>%) is syntax for function composition applied to a specific argument. x |> f() |> g() is equivalent to g(f(x)). For reusable composition, compose(g, f) creates a named function.

purrr::partial(fn, arg = val) fixes some arguments, returning a new function. partial(round, digits = 2) creates a rounding function for two decimal places. partial(dplyr::filter, .data = df) fixes the data argument, creating a filter function specific to one data frame.

Immutable data

R uses copy-on-modify semantics: modifying a vector creates a copy rather than modifying in place. This is nearly immutable behavior, a function that “modifies” a vector actually creates a new vector, leaving the original unchanged.

lobstr::obj_addr(x) returns the memory address of an object. Check that addresses change after modification to confirm no shared state. lobstr::ref(x) shows whether two variable names point to the same underlying data.

The exception: environments and R5/reference classes use mutable reference semantics. e$x <- 1 modifies the environment in place, affecting all references. Avoid mutable state unless you specifically need shared mutable objects.

Pipelines as data transformations

A tidy pipeline is a sequence of pure transformations: data %>% step1() %>% step2() %>% step3(). Each step receives data, transforms it, returns new data. No state is modified. The pipeline is easy to test (test each step independently), easy to debug (inspect output after each step), and easy to parallelize (map the pipeline over partitions of the data).

purrr::compose() builds a pipeline as a named function: clean_data <- compose(step3, step2, step1). clean_data(raw) applies all three steps. This is useful when the same pipeline applies to multiple datasets.

For pipelines with branching (same data needs different downstream transformations), use intermediate variables rather than forking the pipe. Assign the shared intermediate result, then apply each downstream transformation separately.

Functors and applicatives

In category theory, a functor is a mapping that preserves structure. R’s map() is a functor: it maps a function over a container (list, vector) and returns a container of the same shape. map(list(1, 2, 3), f) returns a list of three elements, preserving the list structure.

This abstraction is why map_dbl(), map_chr(), and map_lgl() exist as type-specialized variants, they specify not just the operation but the output type, enabling type checking. The promise is: if f returns a double, map_dbl(x, f) returns a double vector of the same length as x.

purrr::lmap() applies a function to overlapping subsets of a list, enabling sliding window operations. purrr::accumulate() builds up a result step by step, returning all intermediate values, useful for simulating sequential processes.

See also