rguides

Programming with Tidy Evaluation in R

Tidy evaluation is a powerful paradigm in R that allows you to write functions and code that work naturally with the tidyverse. If you’ve used dplyr, ggplot2, or other tidyverse packages, you’ve already benefited from tidy evaluation—even if you didn’t realize it. Understanding how to program with tidy evaluation opens up the ability to create reusable data manipulation functions that feel as natural as writing dplyr pipelines directly.

What is tidy evaluation?

Standard evaluation in R works straightforwardly: you pass an object, and R evaluates it. But the tidyverse uses non-standard evaluation (NSE), where functions capture expressions rather than just their values. This is what allows you to write filter(df, x > 5) instead of filter(df, df$x > 5).

The problem arises when you want to program with these functions. If you try to pass a variable containing a column name:

my_col <- "mpg"
filter(mtcars, my_col > 20)

R interprets my_col as a literal variable, not as the column mpg. This is where tidy evaluation techniques come in.

Tidy evaluation provides mechanisms to capture user input as expressions and then evaluate them in the right context. The key concepts are quasiquotation and quosures, which work together to make programming with tidyverse functions possible.

Quasiquotation: the !! operator

The bang-bang operator (!!) is the primary tool for injecting values into captured expressions. Think of it as “unquote”, it forces early evaluation of part of an expression before the entire expression is evaluated.

Here’s a simple example:

library(dplyr)

col_name <- sym("mpg")
mtcars %>% filter(!!col_name > 20)

The sym() function creates a symbol from a string, and !! injects that symbol into the expression. The expression becomes filter(mtcars, mpg > 20) before evaluation.

In practice, you’ll more often use enquo() to capture user-provided arguments:

filter_by_column <- function(df, col, threshold) {
  col_expr <- enquo(col)
  df %>% filter(!!col_expr > threshold)
}

mtcars %>% filter_by_column(mpg, 20)

The enquo() function captures the expression supplied by the user and wraps it in a quosure, which preserves both the expression and its environment.

The !!! operator for list injection

Sometimes you have multiple arguments stored in a list and want to inject them all at once. The splat operator (!!!) does exactly this, it unpacks a list of expressions into a function call.

args <- list(
  sym("mpg"),
  sym("cyl")
)

mtcars %>% select(!!!args)
# Equivalent to: select(mtcars, mpg, cyl)

This is particularly useful when building up arguments dynamically or when working with !!! inside purrr::inject().

The !!_ operator for big data

The !!_ operator (pronounced “bang-bang-under”) is a variant that works specifically with rlang’s UQE (unquote evaluation) pattern. It’s useful when working with large datasets or when you need more explicit control over how expressions are unquoted. In most everyday programming scenarios, !! is sufficient.

The {{ }} syntax for column injection

The curly brace syntax {{ }} (sometimes called “curly-curly”) provides a more readable way to inject column names. Introduced in rlang 0.4.0, it works directly with bare column names:

group_and_summarize <- function(df, group_col, sum_col) {
  df %>%
    group_by({{ group_col }}) %>%
    summarise(avg = mean({{ sum_col }}), .groups = "drop")
}

mtcars %>% group_and_summarize(cyl, mpg)

The {{ }} operator is syntactic sugar that handles the enquo() and !! pattern automatically. It captures the user’s input and injects it into the expression. This makes your function interfaces cleaner and more intuitive for users.

Practical examples with dplyr

Let’s put these concepts together to build a reusable summary function:

library(dplyr)
library(rlang)

summarise_group <- function(data, group_var, summary_var) {
  group_expr <- enquo(group_var)
  summary_expr <- enquo(summary_var)
  
  data %>%
    group_by(!!group_expr) %>%
    summarise(
      n = n(),
      mean = mean(!!summary_expr, na.rm = TRUE),
      sd = sd(!!summary_expr, na.rm = TRUE),
      .groups = "drop"
    )
}

mtcars %>% summarise_group(cyl, mpg)

Output:

# A tibble: 3 × 4
    cyl     n  mean    sd
  <dbl> <int> <dbl> <dbl>
1     4    11  26.7  4.51
2     6     7  19.7  2.01
3     8    14  15.1  3.70

Here’s another example using the {{ }} syntax with ggplot2:

plot_by_group <- function(data, x_var, y_var, colour_var) {
  ggplot(data, aes(x = {{ x_var }}, y = {{ y_var }}, colour = {{ colour_var }})) +
    geom_point() +
    labs(x = rlang::as_label(enquo(x_var)),
         y = rlang::as_label(enquo(y_var)),
         colour = rlang::as_label(enquo(colour_var)))
}

mtcars %>% plot_by_group(wt, mpg, cyl)

Defusing and injecting expressions

tidy evaluation uses two main operations: defusing (capturing code without evaluating it) and injecting (evaluating captured code). rlang::enquo() defuses a single argument into a quosure. rlang::enexprs() defuses multiple arguments. The injected is done with !! (bang-bang) for single expressions or !!! (bang-bang-bang) for lists.

dplyr::filter(df, !!my_col > 0) injects my_col as an expression, allowing my_col to be a quosure captured from a user’s code. group_by(df, !!!group_vars) injects a list of grouping variables.

The .data and .env pronouns

.data$column_name accesses a column by name in a data-masking context without ambiguity. .env$variable_name accesses a variable from the calling environment, distinguishing it from a column with the same name. These pronouns are important in package code where a variable name might accidentally shadow a column name.

dplyr::filter(df, .data[[col_name]] > .env[[threshold]]) uses both: col_name is a character variable holding the column name (accessed via .data[[]]), and threshold is a variable in the calling environment.

Writing functions with dplyr

The {{ }} (curly-curly) operator is the most concise way to write dplyr functions: my_filter <- function(df, col, val) filter(df, {{ col }} > val). The {{ }} operator combines enquo() and !! in one step. For character column names, across(all_of(char_vector)) and .data[[char_name]] work in most contexts.

rlang::arg_match() validates that a string argument matches one of the allowed values, producing a clear error message for invalid inputs, the standard way to validate enum-like arguments in tidy-evaluation functions.

Injecting multiple columns

When a function needs to accept multiple columns, capture them with ... and use enquos() (plural) to capture all arguments: cols <- enquos(...). Pass them to dplyr verbs with !!!cols for injection. This pattern powers functions like group_by(df, !!!cols) where the grouping columns come from a variable rather than literal names.

String-Based column selection

For functions that receive column names as character strings rather than bare symbols, use the .data pronoun: df |> select(all_of(char_vec)) or df |> mutate(value = .data[[col_name]]). The all_of() and any_of() helpers accept character vectors and work inside any tidy-select context. This approach is simpler than quasiquotation when column names are already strings.

Debugging NSE

NSE code is harder to debug than standard R because errors may reference expression trees rather than concrete values. Use rlang::qq_show(expr) to preview how an expression will be injected before evaluating it. rlang::last_error() and rlang::last_trace() give detailed context for errors inside tidy evaluation. When a dplyr verb fails with a confusing message, simplify the expression to its smallest failing case to isolate the issue.

The problem tidyeval solves

dplyr’s column-selection syntax is convenient for interactive use, you type column names without quotes and dplyr finds them in the data. But this convenience creates a problem for programming: how do you write a function where the column name is a variable? Passing a variable name without quotes results in R looking for an object with that name in the current environment, not a column in the data frame.

Tidyeval is the system that makes data masking work and provides tools for extending it. Data masking is what allows column names to be used as if they were variables, the data frame’s columns are temporarily added to the environment so they resolve without the dollar-sign prefix. The programming tools in rlang and dplyr let you pass, modify, and build expressions that participate in this data-masking system.

Embracing for tidy selection

The double-brace operator, called “embrace,” is the simplest tidyeval tool for function authors. When you write a function that calls dplyr verbs and want to pass column names as arguments, wrap the argument in double braces inside the dplyr call. The embrace operator quotes the argument, captures it as an expression rather than evaluating it, and then injects it into the dplyr verb’s data-masking context.

The embrace operator handles the most common case: accepting a single column as a function argument. It does not work for accepting multiple columns or for building column names programmatically from strings. Those cases require the .data pronoun for single columns from strings, or the any_of and all_of selectors for multiple columns from character vectors.

Programming with multiple columns

When a function should accept multiple column names, use the dots mechanism with dplyr’s selection syntax. A function that takes dots can pass them to select, group_by, or other tidyselect-using functions using the dots splicing syntax. This approach works naturally for interactive use where callers specify columns with bare names, and with any_of when callers pass character vectors.

For generating column names programmatically, building names by concatenating prefixes and suffixes, or iterating over a list of column specifications, the .data pronoun combined with column name strings from variables handles the construction. Combining string manipulation to build column names with the .data pronoun to reference them in dplyr verbs covers most programmatic data manipulation needs without requiring deep knowledge of rlang internals.

When to use each approach

OperatorUse Case
!!Inject a single symbol or expression
!!!Inject multiple arguments from a list
{{ }}Clean syntax for column injection in function arguments

The {{ }} syntax is now the recommended approach for most dplyr programming tasks because it reduces boilerplate and makes your intent clearer. However, understanding !! and !!! helps you debug issues and work with more advanced patterns.

See also

For more information on specific tidyverse functions that use tidy evaluation:

Explore the tidyverse reference pages for more examples of how these functions work together.