R Metaprogramming with rlang: Tidy Evaluation and DSLs
Metaprogramming is the art of writing code that manipulates code. In R, metaprogramming with rlang is particularly relevant when building packages that interface with the tidyverse or when creating domain-specific languages (DSLs). The rlang package provides the foundational primitives for tidy evaluation, making it the backbone of modern R metaprogramming.
This guide covers the core concepts you need to build sophisticated reactive interfaces, custom dplyr verbs, or embedded DSLs.
Understanding tidy evaluation
Tidy evaluation (TQE) is a framework for resolving symbols lazily within a data context. Unlike base R’s immediate evaluation, TQE defers resolution so users can pass unquoted column names that get evaluated in the correct environment.
The bang-Bang operator !!
The !! (bang-bang) operator is your primary tool for unquoting. It forces immediate evaluation of a single expression within a quoting context.
library(rlang)
# A simple function using tidy evaluation
filter_by_column <- function(data, col_name, value) {
col_expr <- ensym(col_name) # Convert string/symbol to expression
data |>
dplyr::filter(!!col_expr == value)
}
# Usage
mtcars |> filter_by_column("cyl", 4)
mtcars |> filter_by_column(cyl, 4) # Works with bare column name too
The key insight: ensym() captures the user’s input as a symbol, then !! injects it into the dplyr pipeline where it’s evaluated against the data frame. You can pass either a string ("cyl") or a bare name (cyl), and ensym() handles both cases transparently.
Unquoting splicing with !!!
When you have a list of expressions to inject, !!! splices them in. Unlike !! which injects a single expression, !!! unpacks a list into multiple arguments. This is essential for building filter conditions or column selections dynamically from a vector of names:
# Building a complex filter programmatically
filter_conditions <- list(
quote(cyl == 6),
quote(disp > 200)
)
# Splicing multiple conditions
mtcars |> dplyr::filter(!!!filter_conditions)
# Practical example: dynamic column selection
select_args <- c("mpg", "cyl", "disp") |> rlang::syms()
mtcars |> dplyr::select(!!!select_args)
Splicing is what makes programmatic dplyr possible. Instead of hard-coding column names, you can generate them from configuration, user input, or column-name discovery. The syms() helper converts a character vector into a list of symbols ready for splicing by !!!.
The definition operator :=
The := operator lets you use LHS definitions in contexts where = wouldn’t work, particularly useful when building calls programmatically. In standard dplyr, you write mutate(new_col = expression), but when the column name is held in a variable, you need := to tell dplyr to evaluate the left-hand side:
# Creating new columns with dynamic names
create_column <- function(data, new_name, values) {
data |>
dplyr::mutate(!!new_name := values)
}
mtcars |> create_column("log_mpg", log(mpg))
# Building column definitions on the fly
col_defs <- list(
mpg_sq = quote(mpg^2),
cyl_factor = quote(factor(cyl))
)
mtcars |> dplyr::mutate(!!!col_defs)
Quosures: capturing expressions with their environments
A quosure is a pair: an expression plus its environment. This is crucial because R code doesn’t just contain operations—it contains closures that reference variables from their creation context.
Capturing quosures with enquo()
The enquo() function captures both the expression and its environment:
calculate_something <- function(x) {
captured <- enquo(x)
print(quosure_expr(captured))
print(quosure_env(captured))
captured
}
# When called with a variable from another environment
my_var <- 10
calculate_something(my_var + 5)
# Expression: my_var + 5
# Environment: <environment: R_GlobalEnv>
This matters when your function needs to pass the captured expression to another function that will evaluate it later, perhaps in a different environment (like inside a dplyr pipeline).
Working with quosures
Once you have captured a quosure with enquo(), you often need to inspect or modify it. quo_get_expr() retrieves the expression part, and quo_get_env() retrieves the environment. These accessors let you decompose a quosure into its two constituent parts and reassemble them in a different form:
# Extracting components
capture_expr <- function(x) {
q <- enquo(x)
expr <- quo_get_expr(q)
env <- quo_get_env(q)
list(expr = expr, env = env)
}
# Building new expressions from quosures
build_expr <- function(x, op = "+") {
x <- enquo(x)
new_expr <- call(op, quo_get_expr(x), 1)
quo(new_expr)
}
# Using the modified quosure
build_expr(mpg)
# Returns: <quosure> mpg + 1
Modifying quosures without losing their environment is the key to metaprogramming with rlang. When you use call() to construct a new expression from a captured quosure, the result inherits the original quosure’s environment, so variable references continue to resolve correctly.
Symbols and expressions
Converting between strings and symbols
The sym() family handles bidirectional conversion between R symbols (unevaluated names) and character strings. This conversion is fundamental to programmatic code generation because it lets you take user-supplied strings and treat them as column names or variable references in tidy evaluation contexts:
# String to symbol
sym("mpg")
#> mpg
# Symbol to string
sym_name(quote(mpg))
#> "mpg"
# Multiple at once
syms(c("mpg", "cyl", "disp"))
#> [[1]]
#> <symbol: mpg>
#>
#> [[2]]
#> <symbol: cyl>
#>
#> [[3]]
#> <symbol: disp>
Building calls programmatically
The call() function constructs unevaluated function calls. Think of it as building R code without executing it: you specify the function name and its arguments, and call() returns an expression object that can be inspected, modified, and later evaluated. This is the foundation of code generation in R:
# Building a call from pieces
call("filter", quote(mtcars), quote(cyl == 4))
# Returns: filter(mtcars, cyl == 4)
# With interpolated values
col <- sym("cyl")
value <- 6
call("filter", quote(mtcars), call("==", col, value))
# Returns: filter(mtcars, cyl == 6)
# Evaluating the constructed call
eval(call("filter", quote(mtcars), quote(cyl == 4)))
Building a simple DSL
Domain-specific languages let you create expressive mini-languages for specific problems. Let’s build a simple filter DSL to demonstrate rlang’s power:
library(rlang)
library(dplyr)
# Define our DSL grammar
filter_dsl <- function(.data, ...) {
conditions <- quos(...)
for (i in seq_along(conditions)) {
cond <- conditions[[i]]
.data <- dplyr::filter(.data, !!cond)
}
.data
}
# Use our DSL
mtcars |>
filter_dsl(cyl == 6, disp > 150)
The quos(...) call captures each filter condition as a separate quosure, preserving the user’s intended column references with their calling environment. This is the same mechanism dplyr uses internally, where each argument to filter() is captured as a quosure and evaluated within the data mask.
A more complex DSL: formula-Based transformations
A filter DSL is useful, but the real power of rlang emerges when you build DSLs that transform data. Formula notation (lhs ~ rhs) provides a natural syntax for specifying column transformations, where the left side names the output column and the right side defines the computation. Using rlang’s formula inspection tools, you can extract both sides and feed them into mutate() with := for dynamic column naming:
# DSL for transforming columns with formulas
transform_dsl <- function(.data, ...) {
transformations <- quos(...)
for (t in transformations) {
formula <- quo_get_expr(t)
# formula is something like: log(mpg)
# Extract the target column name
target <- rlang::f_lhs(formula)
value <- rlang::f_rhs(formula)
.data <- dplyr::mutate(.data, !!target := !!value)
}
.data
}
mtcars |>
transform_dsl(
log_mpg ~ log(mpg),
mpg_doubled ~ mpg * 2
)
Quoting and evaluating in your DSL
The real power emerges when you control evaluation explicitly. By wrapping operators in quoting functions, you can delay evaluation until the DSL has assembled the full expression. The custom %in_dsl% operator below captures both its left and right sides as quosures, then builds a single quosure that combines them with %in%. When this combined quosure is later unquoted inside filter(), it evaluates correctly in the data mask context:
# DSL with custom operators
`%in_dsl%` <- function(lhs, rhs) {
# Capture both sides with their environments
lhs <- enquo(lhs)
rhs <- enquo(rhs)
# Build a quosure that will evaluate correctly
quo(!!quo_get_expr(lhs) %in% !!quo_get_expr(rhs))
}
# Usage in a filter context
mtcars |>
filter(cyl %in_dsl% c(4, 6))
Practical patterns
Defusing and re-evaluating
Sometimes you need to capture an expression without evaluating it, then evaluate it later in a different context:
# Defuse an expression
defuse_expr <- function(x) {
#quo_splice(enquo(x)) # Alternative approach
expr <- enexpr(x)
env <- current_env()
new_quosure(expr, env)
}
# Re-evaluate with different data
recompute <- function(quosure, new_data) {
# Create a new environment with new_data attached
eval_tidy(quosure_expr(quosure), data = new_data)
}
Working with dots ...
The ... argument requires special handling because it captures a variable number of arguments, each with its own environment. quos(...) captures all dots as a list of quosures, preserving each argument’s expression and environment independently. This pattern is used by dplyr verbs like select() and mutate() to accept multiple column references. The example below demonstrates capturing dots and inspecting each quosure’s contents:
library(purrr)
handle_dots <- function(...) {
# Capture all arguments as quosures
dots <- quos(...)
# Each element is a quosure
purrr::map(dots, function(q) {
list(
expr = quo_get_expr(q),
env = quo_get_env(q)
)
})
}
handle_dots(mpg, cyl + 1, "literal")
Error handling with typing
Validate inputs early to provide clear errors. When a user passes a column name that doesn’t exist, rlang’s abort() can signal a structured error with a helpful message and a custom error class. This is far better than letting the error bubble up from deep inside dplyr with a cryptic message about missing columns:
validate_column <- function(x, data) {
x <- enquo(x)
col_name <- quo_name(x)
if (!col_name %in% names(data)) {
abort(paste0("Column '", col_name, "' not found in data"))
}
x
}
# Safe usage
safe_filter <- function(data, col) {
col <- validate_column({{col}}, data)
dplyr::filter(data, !!col > median(!!col))
}
Common pitfalls
-
Forgetting
!!: Passing a captured symbol without unquoting just passes the symbol as data, not as a column reference. -
Environment mismatches: Quosures carry their environment. If you build expressions in one context and evaluate in another, variables may resolve differently than expected.
-
Non-standard evaluation in base R: Functions like
subset()use non-standard evaluation that differs from tidy evaluation. Be explicit about which framework you’re using. -
Lazy evaluation traps: In functions with multiple arguments evaluated in different contexts,
enquo()captures at the right moment—after the user evaluates their argument but before your function evaluates it.
See also
- /guides/r-error-handling, Error handling patterns in R
- /guides/purrr-functional-programming, Functional programming with purrr
- /guides/r-apply-family — The apply family of functions