Modular R Code with box — A Modern Module System
Modern R projects can quickly become unwieldy as they grow, which is why modular R code, organised into self-contained files with explicit imports, makes a real difference at scale. Functions get scattered across files, naming conflicts emerge, and tracking dependencies becomes a nightmare. The box package provides a clean solution: a modern module system that lets you treat files and folders of R code as independent, nestable modules without the overhead of creating formal R packages.
This guide walks you through everything you need to write modular, maintainable R code with box.
Why use modules?
Before diving into box, consider the problem it solves. In traditional R scripts, you either load everything with library(), which pollutes your global namespace, or you carefully manage which functions you import. Neither approach scales well for large projects.
The box package brings module-based development to R:
- Local scoping: Imports don’t clutter your global environment
- Explicit dependencies: Every import is visible in your code
- Nested modules: Organize code in folders that mirror your project structure
- No package build required: Use modules directly from files and directories
Think of it like JavaScript’s import statements or Python’s modules, but for R.
Installation
Install box from CRAN the way you would any R package:
install.packages("box")
The package requires R 3.6.0 or later. You can check your R version with R.version.string.
Your first module
Let’s build a simple module to see how this works in practice. Suppose you have a project with some utility functions for data cleaning. Instead of scattering these across your scripts, put them in a module.
Create a folder called utils in your project directory, then create a file called cleaning.R inside it:
# File: utils/cleaning.R
#' Remove outliers using IQR method
#' @param x Numeric vector
#' @param multiplier IQR multiplier (default 1.5)
#' @export
remove_outliers <- function(x, multiplier = 1.5) {
q <- quantile(x, c(0.25, 0.75))
iqr <- q[2] - q[1]
lower <- q[1] - multiplier * iqr
upper <- q[2] + multiplier * iqr
x[x >= lower & x <= upper]
}
#' Normalize vector to 0-1 range
#' @param x Numeric vector
#' @export
normalize <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
This is your first module. It lives in utils/cleaning.R. The #' @export tags mark functions that other files can import, exactly like you would in a formal R package.
Loading modules and packages
Now let’s use this module in your main script. The box::use() function is your primary tool for both modules and packages. It supports several import styles: full module access, selective imports, aliases, and bulk attachment; each suited to different situations depending on how many functions you need and whether naming conflicts are a concern.
Loading a module
The simplest import loads the entire module behind a namespace. Access exported functions with $, the same way you would access list elements or package namespaces. This keeps the origin of each function visible in your code: you can always trace cleaning$remove_outliers back to the utils/cleaning.R file.
box::use(utils/cleaning)
data <- c(1, 2, 3, 4, 100) # 100 is an outlier
clean_data <- cleaning$remove_outliers(data)
normalized <- cleaning$normalize(clean_data)
Loading with an alias and selective imports
When module paths are long or two modules export the same function name, aliases and selective imports prevent conflicts. An alias shortens the namespace prefix; selective imports bring specific functions into the current scope without exposing the rest of the module. Both approaches keep your code explicit about what it uses.
# Alias: shortens the namespace
box::use(utils/cleaning[as clean])
clean_data <- clean$remove_outliers(data)
# Selective import: functions available directly
box::use(utils/cleaning[remove_outliers, normalize])
clean_data <- remove_outliers(data)
normalized <- normalize(clean_data)
Loading packages and bulk attachment
The same box::use() syntax replaces library() for packages too, with the advantage of local scoping: imports do not attach to the global search path. This eliminates namespace pollution in large projects where multiple scripts load different sets of packages. When you want the convenience of having all exported functions available directly (like traditional library()), the [...] attachment syntax imports everything marked with #' @export.
# Package imports: locally scoped, no global side effects
box::use(stats)
box::use(ggplot2)
box::use(dplyr)
ggplot2$ggplot(data, ggplot2$aes(x = x, y = y))
result <- stats$sd(x)
# Bulk attachment: all exports available directly
box::use(utils/cleaning[...])
clean_data <- remove_outliers(data)
Writing and exporting from modules
You have two ways to mark functions for export from a module.
Method 1: roxygen2 tags
Use #' @export before each function you want to export, just as you would when building a formal R package. The box system parses these roxygen2-style tags to determine which functions are public. This approach works well if you already document your code with roxygen2 and want to keep your modules consistent with the package development workflow you are familiar with:
#' Sum two numbers
#' @param a First number
#' @param b Second number
#' @export
add <- function(a, b) {
a + b
}
This works smoothly if you’re already using roxygen2 for documentation. If you prefer a more explicit approach that does not rely on comment-based markup, you can call the export function directly from the module body.
Method 2: box::export()
Alternatively, call box::export() explicitly. This method keeps the export declaration in executable R code rather than in comments, which some developers prefer for its visibility: you can see at a glance, at the bottom of the file, exactly what the module exposes:
add <- function(a, b) {
a + b
}
box::export(add)
This is useful when you want conditional exports or dynamic function lists. After you have decided which functions to make public, you also need to think about what stays hidden. A well-designed module exposes only the functions callers need, keeping internal helpers private so the module’s interface stays small and stable.
Helper functions that stay private
Functions without #' @export or a box::export() call remain private to the module, inaccessible to external callers but fully usable within the module itself. This is the same public-private boundary that formal packages enforce: the difference is you get it without writing a NAMESPACE file:
# File: utils/internal.R
# This function is private: only usable within the module
helper <- function(x) {
x * 2
}
#' This one is exported
#' @export
public_function <- function(y) {
helper(y) # Uses the private helper internally
}
Key features
Nested modules
Create folders within folders to organize related functionality. By mirroring your project’s logical structure in the filesystem, nested modules let teams divide work across subdirectories without stepping on each other’s code. A submodule developer only needs to know the parent module’s export contract:
project/
├── app.R
├── models/
│ ├── __init__.R # Module entry point
│ ├── linear.R
│ └── tree.R
└── utils/
├── __init__.R
├── cleaning.R
└── validation.R
The __init__.R file is the module entry point, similar to Python’s __init__.py. When you import the folder, box executes __init__.R first. This file re-exports selections from each submodule, presenting a clean unified interface to the rest of the application. Any function not listed in box::export() stays internal to the module directory:
# File: models/__init__.R
box::use(models/linear)
box::use(models/tree)
box::export(
linear$fit,
tree$fit
)
Now importing models gives you access to both submodules through a single namespace. Callers do not need to know that fit actually lives in models/linear.R — the __init__.R hides that implementation detail and lets you reorganise submodules later without changing every import site:
box::use(models)
models$linear$fit(data)
Reloading modules during development
When you’re actively developing a module, use box::reload() to pick up changes without restarting R. This eliminates the slow edit-restart cycle that makes module development tedious and lets you experiment with function logic quickly:
box::use(utils/cleaning)
# Make changes to utils/cleaning.R, then reload
box::reload(utils/cleaning)
This dramatically speeds up development cycles. Once you have a module loaded and working, you may need to inspect its metadata at runtime — verifying that the correct file was loaded or including the module name in log messages are two common tasks. The introspection helpers that follow give you that runtime visibility without any configuration or extra setup steps needed.
Module information helpers
Two functions help you understand your module’s context at runtime. box::name() returns the module’s name, and box::file() returns the full file path to the module source, both useful for debugging and introspection within a running module:
box::use(utils/cleaning)
box::name() # "cleaning"
box::file() # "/path/to/project/utils/cleaning.R"
Initialization hooks
Use .on_load() for module initialization code that runs when the module first loads. This hook is the right place for opening database connections, loading configuration files, or setting up package-level options — anything that should happen exactly once when the module comes into scope, not every time a function is called:
# File: utils/database.R
.on_load <- function() {
message("Initializing database connection...")
# Set up connections, load configs, etc.
}
box::export(
query,
connect
)
Common pitfalls
Understanding what box does differently from traditional R loading will save you debugging time. The most common surprise for developers coming from library()-based workflows is that box does not attach packages to the search path, so standard package functions are not automatically available.
Only base package is attached
Inside a module, only the base package is automatically available. If you need functions from standard packages, import them explicitly — box treats every dependency as opt-in rather than inherited from the global session:
# This won't work inside a module:
mean(x) # Error: could not find function "mean"
# Do this instead:
box::use(stats[mean])
mean(x) # Works
Module vs package syntax
Remember the difference between modules and packages — this is the single most common source of confusing errors when you first adopt box. The path prefix determines whether box::use() treats the argument as a CRAN-installed package or a local file:
box::use(stats) # Loads the stats PACKAGE
box::use(./utils) # Loads the utils MODULE (relative path)
box::use(parent/utils) # Loads from parent directory
Omitting the ./ or / prefix always loads a package, not a module. If you get a “not found” error, double-check whether you meant a local module or a package from your library. The same box::use() call serves both roles, so the prefix is the only signal the system has to decide which lookup to perform.
Case sensitivity
Module paths are case-sensitive. This catches developers who work on case-insensitive file systems (macOS, Windows) and then deploy to Linux servers where the same box::use() call suddenly fails. Always match the exact letter case of the file on disk:
box::use(utils/cleaning) # Correct
box::use(utils/Cleaning) # Wrong: check your file names
Relative paths for local modules
When importing modules in the same directory, use ./:
box::use(./my_module) # Same directory
box::use(../shared) # Parent directory
box::use(project/utils) # Subdirectory
Namespace differences
A key gotcha for newcomers: box::use(pkg) does not attach the package to the search path — it makes it available exclusively via namespace access with $. This is a deliberate design choice that keeps your global environment clean but means you cannot call package functions without the $ prefix unless you use the [...] attachment syntax shown in the earlier loading examples:
box::use(stats)
sd(x) # Error: not in search path
stats$sd(x) # Works correctly
Comparison with alternatives
Here is how box stacks up against other approaches you might consider. Each alternative trades off different levels of isolation, setup overhead, and namespace control.
vs library()
The library() function attaches packages to your search path, which can cause name conflicts and makes dependencies implicit. box::use() is explicit about what you’re importing and keeps your environment clean:
# Traditional approach: hidden dependencies
library(dplyr)
library(ggplot2)
filter(data, x > 0) # Which package's filter?
# Box approach: explicit
box::use(dplyr[filter])
box::use(ggplot2)
filter(data, x > 0) # Unambiguously from dplyr
vs devtools::load_all()
The devtools::load_all() function is designed for package development; it simulates installing and loading a package you’re working on. box, by contrast, is designed for production modularity in non-package projects. Use load_all() while building formal packages; use box for modular scripts and applications.
vs namespace manipulation
Traditional R namespaces (:: access) require you to create a formal package with a NAMESPACE file. box gives you namespace-like isolation without the package build process. You get the benefits of proper modularity with a fraction of the overhead.
Testing box modules
Testing box modules uses the same testthat framework as packages. In a test file, box::use(./mymodule) imports the module under test. The module’s exported functions are then available as mymodule$function_name(). This keeps test code close to the module it covers and makes the imports explicit. Run tests with testthat::test_dir("tests/") from the project root. Because box modules are plain R files, they have no compilation step, tests run immediately without devtools::load_all() overhead.
When to use box modules
box modules are most valuable in large R projects where the flat namespace of standard R code becomes a maintenance problem. When a single R script runs to hundreds of lines and imports dozens of packages, box allows organizing the code into logical units with explicit exports and controlled imports. This scales better than sourcing multiple scripts with source(), which has no concept of public vs. private functions.
For package development, box complements rather than replaces the standard package structure. Some authors write the core logic as box modules during development (for fast iteration without devtools::load_all()) and convert to package format for distribution. Others use box throughout, packaging their modules as installable packages that use box’s module loading internally.
Summary
The box package brings modern module thinking to R. Here’s what you learned:
- Modules are just files: Any R file in a folder can be a module
- Export with
#' @export: Like roxygen2, mark functions as available to importers - Import explicitly: Use
box::use()for modules and packages - Access via
$: Imported modules work like named lists - Stay local: Imports don’t pollute your global environment
- Reload during dev:
box::reload()picks up changes instantly - Nest folders: Organize code in directory hierarchies
For projects too small to warrant a full package but too large for a single script, box provides the structure you need. Your future self, and your collaborators, will thank you for writing modular code that’s easy to understand, test, and maintain.
See also
- Functions and Control Flow in R: Master function writing before organizing them into modules.
- Building R Packages: When your project grows beyond modules, graduate to a formal R package.
- Lists and Environments: Understand R’s scope rules that box modules use for local imports.
- Data Wrangling with dplyr: Apply modular data cleaning functions inside dplyr pipelines.