rguides

Non-Standard Evaluation (NSE) in R

Non-Standard Evaluation (NSE) is one of those R concepts that trips up newcomers but becomes essential once you understand it. If you’ve ever used dplyr::filter() or ggplot2::aes(), you’ve benefited from NSE—even if you didn’t know it.

What is non-Standard evaluation?

In most programming languages, when you pass an argument to a function, it’s evaluated first, then the value is passed in. Standard evaluation works like this:

# Standard evaluation - x is evaluated to 5 first
my_func(1 + 4)  # receives 5

But R allows functions to capture the unevaluated expression itself. This is NSE. When you write:

filter(df, age > 30)

The filter() function doesn’t receive the value age > 30—it receives the expression age > 30 as a promise object. It can then decide how to evaluate that expression, which is what makes the tidyverse work.

How NSE works in R

R has several mechanisms for working with unevaluated expressions:

  • quote() - Captures an expression without evaluating it
  • substitute() - Captures the expression as it appears in the calling environment
  • eval() - Evaluates an expression in a specified environment

Here’s a quick demo:

# quote() captures the expression
expr <- quote(1 + 2)
expr
# 1 + 2

# eval() evaluates it
eval(expr)
# [1] 3

The key difference between quote() and substitute(): substitute() also performs environment substitution, which is crucial for NSE functions.

Capturing expressions with substitute()

substitute() is the workhorse of base R NSE. It grabs the expression from the parent frame:

capture_expr <- function(x) {
  substitute(x)
}

capture_expr(1 + 2)
# 1 + 2

capture_expr(mean(x, na.rm = TRUE))
# mean(x, na.rm = TRUE)

This is exactly what base R functions like subset() use:

# This works because subset() uses NSE
subset(mtcars, cyl == 4)

Building NSE functions

Let’s build a simple NSE function to see how it works. We’ll create a filter_gt() function that filters a data frame where a column is greater than a threshold:

filter_gt <- function(df, column, threshold) {
  # Capture the unevaluated column name
  col_expr <- substitute(column)
  
  # Build the expression
  filter_expr <- quote(
    df[which(df[[col_name]] > threshold), ]
  )
  
  # Substitute the actual column name
  filter_expr <- substitute(filter_expr, 
                            list(col_name = as.character(col_expr)))
  
  eval(filter_expr)
}

# Usage
filter_gt(mtcars, cyl, 6)

This is a simplified example, but it shows the pattern: capture the expression, build a new expression, then evaluate it.

NSE in the tidyverse

The tidyverse uses a more sophisticated system called tidyeval, built on top of the rlang package. The key functions are:

  • enquo() - Capture an argument as a quosure (quoted expression + environment)
  • enquos() - Capture multiple arguments
  • !! (bang-bang) - Unquote an expression into its surrounding context
  • {{ }} (curly-curly) - Pronoun for referencing captured arguments

Here’s a practical example:

library(dplyr)
library(rlang)

filter_above_threshold <- function(df, col, threshold) {
  # Capture the column expression
  col <- enquo(col)
  
  df %>%
    filter(!!col > threshold)
}

# Usage
mtcars %>% filter_above_threshold(cyl, 6)

The !! operator (called “bang-bang”) inserts the captured expression into the filter() call. This is how dplyr processes your bare column names.

For multiple columns, use across() with where():

summarise_all_above <- function(df, threshold) {
  df %>%
    summarise(across(where(is.numeric), ~ mean(.x[.x > threshold])))
}

Common pitfalls

Forgetting to unquote - If you capture with enquo() but forget !!, you’ll get unexpected results:

# WRONG - will error or behave strangely
filter(df, col)  # col is a variable, not the column name

# RIGHT
filter(df, !!col_expr)

Environment issues - Expressions carry their environment. If you build an expression in one function and eval in another, you might get scoping bugs. Quosures (from enquo()) help by bundling expression + environment.

Mixing quoted and unquoted - Be consistent. Either your function takes strings (easy but verbose) or bare expressions (concise but requires understanding NSE).

Quoting and unquoting

The fundamental operations in tidy evaluation: rlang::quo() captures an expression along with its environment, creating a quosure. rlang::expr() captures an expression without its environment (for building code programmatically). rlang::enquo() captures the expression passed by a function caller.

Unquoting with !! evaluates a quosure or expression in the context where it is injected: filter(df, !!my_col == value) evaluates my_col in the surrounding environment and inserts the result into the filter expression. !!! splices a list: group_by(df, !!!group_list) splices multiple grouping variables.

Base R equivalents

Before tidy evaluation, R had substitute() and eval() for metaprogramming. substitute(expr) captures an expression without evaluating it, similar to rlang::expr(). eval(expr, envir = environment()) evaluates a captured expression in a specified environment. These are still used in base R and package code that predates the rlang ecosystem.

match.call() captures the current function call as an expression, useful for constructing informative error messages. sys.call() and sys.function() provide access to the call stack.

Practical applications

NSE enables the tidyverse’s clean syntax: column names without quotes, expressions in filter conditions, and formulas for model specification. When writing packages that use dplyr and ggplot2 functions with programmatic column selection, {{ col }} (curly-curly for user-supplied columns) and .data[[col_name]] (for character column names) are the two patterns that cover most use cases.

Quasiquotation

The tidyverse uses rlang’s quasiquotation for NSE. rlang::enquo() captures an expression passed to a function without evaluating it. !! (bang-bang) unquotes an expression into a surrounding quote. !!! unquotes and splices a list of expressions. This system allows functions to accept column names as bare symbols and forward them to dplyr verbs.

The .data pronoun

The .data pronoun from rlang allows programmatic column access inside dplyr verbs without quasiquotation: mutate(df, x = .data[[col_name]]) where col_name is a character variable. This is simpler than enquo() + !! for the common case of selecting a column by a string name at runtime.

Base R NSE

Base R NSE uses substitute() to capture an expression and eval() to evaluate it in an environment. deparse(substitute(x)) converts an unevaluated expression to a string, the standard pattern for functions that use the argument name as a label (like axis labels in base plotting functions). Understanding base R NSE helps when reading older code and packages that predate rlang.

Understanding non-Standard evaluation

Standard evaluation in R evaluates expressions immediately: f(x + 1) evaluates x + 1 and passes the result to f. Non-standard evaluation (NSE) delays or redirects evaluation: the expression itself, not its value, is passed and possibly evaluated in a different environment.

NSE is what makes dplyr’s verbs work the way they do. filter(df, age > 30) does not evaluate age in the calling environment (where age might not exist), it captures the expression age > 30 and evaluates it in the context of df’s columns. This is NSE: the expression’s evaluation is controlled by the function, not the standard R rules.

NSE makes code more concise and readable for interactive use. It enables functions that feel like the data is their natural namespace, where column names are directly accessible without quoting or prefixing with the data frame name. This is why dplyr code reads naturally.

The tradeoff is complexity in programming. Writing functions that call dplyr functions with variable column names requires explicit handling of NSE. Base R functions that use NSE (subset(), transform(), with()) have similar issues. The rlang package provides principled tools for handling NSE in functions.

Quoting and unquoting explained

Quoting captures an expression before R evaluates it. quote(x + 1) returns the expression itself as a language object, not the value of x + 1. substitute(x + 1) does the same but also substitutes any variables in the caller’s environment that match names in the expression, the mechanism behind function arguments capturing their calling expression.

The distinction between quote() and substitute() matters for functions. Inside a function: substitute(arg) captures the expression the caller passed for arg. quote(arg) would capture the symbol arg itself, not the caller’s expression. Functions that want to capture their argument expressions use substitute(deparse()) or rlang::enquo().

Unquoting inserts a captured expression into a new expression. In rlang: !! (bang-bang) unquotes a quosure inside rlang’s expr() or in tidy evaluation contexts. base::bquote(.(x) + 1) substitutes the value of x into the expression. These mechanisms enable building expressions programmatically.

Writing functions with dplyr

The most common NSE challenge in practice: writing a function that uses dplyr verbs with column names specified as function arguments.

Using the embracing operator ({{}}):

summarise_by <- function(df, group_col, value_col) {
  df %>%
    group_by({{ group_col }}) %>%
    summarise(mean = mean({{ value_col }}), .groups = "drop")
}
summarise_by(mtcars, cyl, mpg)

This is the recommended approach for most cases. {{ }} embraces the argument, performing enquo and !! in one step.

For string column names:

summarise_by_string <- function(df, group_col, value_col) {
  df %>%
    group_by(.data[[group_col]]) %>%
    summarise(mean = mean(.data[[value_col]]), .groups = "drop")
}
summarise_by_string(mtcars, "cyl", "mpg")

The .data pronoun with double-bracket subsetting accesses columns by string name in data-masked contexts.

Dynamic variable names in mutate

Creating new columns with dynamic names: mutate({name} := expression) uses the walrus operator (:=) with an embraced or glue-string name.

add_scaled <- function(df, col, suffix = "_scaled") {
  new_name <- paste0(rlang::as_name(rlang::enquo(col)), suffix)
  df %>% mutate("{new_name}" := scale({{ col }})[, 1])
}

The glue-string "{variable}" on the left side of := substitutes the variable’s value as the new column name. This pattern enables functions that create columns with names derived from their arguments.

Best practices

  1. Always provide both NSE and standard evaluation versions - Use ... for NSE and explicit .vars or similar for standard evaluation, like dplyr::select() does.

  2. Document your NSE clearly - Users need to know they can pass bare column names.

  3. Test with non-standard inputs - Pass a symbol, a string, and an expression to see how your function handles each.

  4. Use tidyeval for new code - The rlang approach is more reliable than base R NSE.

  5. Consider deparse(substitute()) for messages - This gives you the user-facing name:

my_function <- function(x) {
  x_name <- deparse(substitute(x))
  message(paste("Operating on:", x_name))
}

See also

  • rlang — Metaprogramming with rlang for advanced tidyeval
  • dplyr-filter — How dplyr’s filter() uses NSE internally
  • purrr-map — Functional iteration with purrr