rguides

R Environments and Scoping: A Practical Guide

What is an R environment?

An R environment is a bag of name-to-value bindings plus a pointer to a parent frame. That’s it. No order, no duplicate names, no copy-on-modify. Every time you type x at the prompt, R walks the parent chain looking for a binding called x, and the first match wins.

This bag-plus-parent shape is what powers R’s scoping rules. Once you internalise that one fact, lexical scoping, closures, and the attach() debate all snap into place.

Environments look like lists, and you’ll often see code use $ and [[ ]] on them. Treat that as syntax sugar. The four real differences from lists matter more:

  1. Unique names only. Two bindings with the same name in the same frame collapse; the second silently overwrites the first.
  2. No order. ls() returns a sorted character vector for display, but the structure is unordered.
  3. A parent frame. Lists don’t have one. Environments do, and the parent is what lexical scoping walks.
  4. Reference semantics. e1 <- e2 does not copy. e1$x <- 1 is visible through e2. Lists copy on modify; environments don’t.

That fourth point catches everyone coming from Python or from a copy-by-value mental model. Hold onto it.

Creating and inspecting environments

new.env() makes one. You can also set the parent and a starting size, but Advanced R’s advice is to ignore the hash and size parameters. The defaults are fine.

e <- new.env()
e$x <- 1:3
e$y <- "hello"

ls(e, all.names = TRUE)
# [1] "x" "y"

is.environment(e)
# [1] TRUE

environmentName(e)
# [1] ""
environmentName(globalenv())
# [1] "R_GlobalEnv"

ls() hides names starting with . by default. Pass all.names = TRUE to see them. Most user-created frames come back from environmentName() as "", since names are only set by R itself for system frames like R_GlobalEnv and package:stats.

A few scopes come pre-built and are worth knowing by name:

  • globalenv() (a.k.a. .GlobalEnv) is your workspace at the prompt.
  • baseenv() is the base package’s frame, where c(), mean(), etc. live. Its parent is the empty environment.
  • emptyenv() is the chain terminator. It has no parent.

Reading, writing, and removing bindings

The string-taking trio handles dynamic names:

e <- new.env()
assign("a", 10, envir = e)
assign("b", 20, envir = e)

get("a", envir = e)
# [1] 10
exists("a", envir = e)
# [1] TRUE
exists("a", envir = e, inherits = FALSE)
# [1] TRUE

inherits = TRUE (the default for get() and exists()) walks the parent chain. inherits = FALSE looks only at the named frame. For rm(), the default is inherits = FALSE, which is the right default for a destructive verb. You don’t want rm("x") to reach up the chain and delete x from globalenv().

To clear an environment:

rm(list = ls(e, all.names = TRUE), envir = e)
ls(e, all.names = TRUE)
# character(0)

If you want a friendlier view than the bare address that print(e) shows, rlang::env_print(e) dumps the name, parent, and bindings with their types. Most R package code reaches for rlang::env() and rlang::env_bind() rather than new.env() and assign().

Environments have parents

Every environment (except emptyenv()) has a parent. parent.env(e) returns it, and a chain of parent.env() calls from any frame will eventually reach emptyenv().

outer <- new.env()
outer$greeting <- "hi"

inner <- new.env(parent = outer)
inner$name <- "world"

get("greeting", envir = inner, inherits = TRUE)
# [1] "hi"
get("name", envir = inner, inherits = FALSE)
# [1] "world"

greeting isn’t bound in inner, so lookup walks up to outer and finds it. name is found locally. This is the same lookup rule R uses for function calls: the function’s enclosing environment is the start of the chain.

You can set the parent with parent.env(e) <- new_parent, but the manual flags this as “extremely dangerous” and reserves the right to remove it. The honest advice: don’t. If you need a fresh frame with a specific parent, build it that way in new.env(parent = ...) to begin with.

The search path

The search path is the chain of environments R consults for bare names. search() returns it as a character vector.

search()
# [1] ".GlobalEnv"        "package:stats"     "package:graphics"
# [4] "package:grDevices" "package:utils"    "package:datasets"
# [7] "package:methods"   "Autoloads"         "package:base"

The global environment is first, then the attached package exports, then the base package, then (logically) the empty environment. library(pkg) inserts the package’s exports environment onto this list. Package namespaces are not on the search path. Only the package’s exports environment is.

attach() does the same thing but for a list, data frame, or saved R image. The R Inferno has a whole circle dedicated to its pitfalls, and they’re real:

  • The attached entry is a promise, not a copy. The column is materialised only when accessed.
  • You can replace a column in the attached entry with assign("col", value, envir = as.environment("...whatever...")), and the original data frame won’t change. Then something else reads from the original, and suddenly you and a colleague are looking at different versions of the data.
  • Search path state is a frequent source of “works on my machine” bugs. Scripts that depend on what the user has attached are not reproducible.

The modern answer is to pass data frames explicitly to functions, or use with() / within() for one-off expressions. If you feel the urge to attach(), reach for those first.

Masking is the related footgun: library(MASS) after library(dplyr) shadows select, filter, and friends. The fix is dplyr::select(...) for disambiguating, or just don’t attach conflicting packages. The double-colon form costs you nothing.

Lexical vs dynamic scoping

R uses lexical scoping (a.k.a. static scoping). Free variables in a function body resolve by walking the chain of enclosing environments starting at environment(f), not the chain of calling environments.

The two functions that look similar but do different things:

  • environment(f) is the enclosure of f: where f was defined, and the start of the lexical chain.
  • parent.frame(n = 1) is the caller’s frame. n steps up the call chain. This is dynamic lookup.
who_am_i <- function() {
  cat("enclosing:", environmentName(environment()), "\n")
  cat("caller:    ", environmentName(parent.frame()), "\n")
}

who_am_i()
# enclosing: R_GlobalEnv
# caller:     R_GlobalEnv

At the top level, those two scopes happen to be the same. Inside a function call, they diverge: environment(f) is still the place the function was defined, while parent.frame() is whoever called it. parent.frame() is what optim, integrate, and the various non-standard-evaluation functions use to peek at the caller’s variables. Use it deliberately, not as a substitute for arguments.

sys.call(), sys.frame(), and friends let you walk the call stack:

trace_me <- function() {
  cat("call stack:\n")
  for (i in seq_along(sys.calls())) {
    cat("  [", i, "] ", deparse(sys.call(i)), "\n", sep = "")
  }
}
trace_me(1 + 2)
# call stack:
#   [1] trace_me(1 + 2)

The call stack is itself just a list of environments. That’s not a metaphor: sys.frames() returns it as a list of frames.

Closures: functions capture their enclosing environment

A closure is a function paired with its enclosing environment. When you create a function inside another function, the inner function’s environment() is the execution frame of the outer call. If the inner function escapes (returned, stored, or otherwise outlives the outer call), the captured frame stays alive. The inner function can read those captured variables, and with <<- it can write to them as well.

make_power <- function(p) {
  function(x) x^p
}

square <- make_power(2)
cube   <- make_power(3)

square(4)
# [1] 16
cube(2)
# [1] 8

environment(square)$p
# [1] 2
environment(cube)$p
# [1] 3

square and cube look like the same function, but environment() reveals they each have a different captured p. Closures are the mechanism behind R6 classes, purrr style function factories, and most of the “this function returns a function” pattern in tidyverse code.

<<- walks the lexical chain looking for an existing variable to rebind. If it doesn’t find one, it silently creates the binding in the global environment. Usually a bug, sometimes a feature. For anything that needs to mutate captured state, prefer an explicit environment or assign(..., inherits = TRUE) so the intent is obvious.

Temporary scope with with(), within(), and local()

These exist specifically to avoid attach() for short-lived scope work:

df <- data.frame(a = 1:3, b = 4:6)
with(df, a + b)
# [1] 5 7 9

within(df, c <- a * b)
#   a b  c
# 1 1 4  4
# 2 2 5 10
# 3 3 6 18

with() evaluates an expression with column-like lookups resolved first in data, then in the caller’s scope. within() is the same idea but returns a copy of data with the assignments applied. Useful for derived columns.

local() evaluates an expression in a brand-new frame whose parent is the calling environment. Anything assigned inside it does not leak out:

x <- 1
local({
  x <- 99
  x
})
# [1] 99
x
# [1] 1

local() is the base-R factory pattern. rlang::local() is friendlier (quasiquotation, default parent is the caller).

Locked and active bindings

bindenv is in base R and gives you a bit more control:

counter <- new.env()
counter$n <- 0

makeActiveBinding("count", function(v) {
  if (missing(v)) {
    get("n", envir = counter)
  } else {
    assign("n", v, envir = counter)
    invisible(v)
  }
}, counter)

counter$count
# [1] 0
counter$count <- 5
counter$count
# [1] 5
counter$n
# [1] 5

makeActiveBinding() installs a function that runs every time the binding is read or written. Active bindings back R6 fields, lazy/cached values, and thread-safe wrappers. They do not survive save() / load(). For long-lived active state, define them in a package’s .onLoad.

lockBinding() prevents the value of a single binding from being changed. lockEnvironment() prevents adding or removing bindings. Package namespaces are locked at load time, which is why assign("foo", 1, envir = asNamespace("base")) errors. You almost never want to lock things yourself, but you will sometimes need to release the bindings to monkey-patch while developing.

Common pitfalls

A short list of bugs that bite everyone at least once:

  • Reaching for parent.frame() when you meant parent.env(). Free variables resolve through the enclosing chain, not the call chain. If your function “can’t see” a variable defined by its caller, that’s by design.
  • Search path masking. library(MASS) shadows dplyr::select. Use dplyr::select(...) or just don’t attach conflicting packages.
  • attach() surprises. Promises, in-place column mutation, non-reproducible scripts. The R Inferno has a chapter; read it once and you won’t get bitten.
  • Copy-on-modify doesn’t apply to environments. e1 <- e2; e1$x <- 1 mutates the same object e2 sees. This is correct R behaviour, but it surprises people coming from Python.
  • <<- creating globals. A function with a stray <<- for a name that doesn’t exist up the chain silently binds it in globalenv(). R CMD check warns about this; lintr::object_usage_linter() catches some cases.
  • Comparing environments with ==. Use identical(env1, env2). The == operator tries to coerce frames to vectors and gives garbage.

After a few months of R, you stop fighting R environments and start using them. The mental model (bags of bindings plus a parent pointer) is small enough to hold in your head, and once it’s there, scoping stops being mysterious.

See Also