rguides

High-Performance Vectors with vctrs

The vctrs package provides a consistent framework for working with vectors in R. It solves a fundamental problem: R’s base functions handle different vector types inconsistently. The package gives you type-safe operations, automatic recycling, and a system for creating custom vector types that integrate smoothly with the tidyverse.

What vctrs solves

Base R treats vectors differently depending on their type. length() works everywhere, but nrow() only works on data frames. Recycling behavior varies. Type conversion is unpredictable. These inconsistencies force you to write defensive code that checks vector types at runtime.

vctrs establishes a unified vector protocol. It defines operations that work identically across all vector types, including your own custom classes. The package powers dplyr, tidyr, and ggplot2 under the hood.

Core functions

vec_size and vec_size_common

vec_size() returns the length of any vector, treating data frames as if they were vectors of rows:

library(vctrs)

vec_size(1:10)
#> [1] 10

vec_size(mtcars)
#> [1] 32

vec_size_common() computes a common size for multiple vectors, enforcing length-1 vectors to recycle:

vec_size_common(1:10, c(TRUE, FALSE), letters[1:2])
#> [1] 10

vec_slice

vec_slice() extracts a subset using integer indices. It preserves vector attributes:

x <- c(a = 1, b = 2, c = 3)
vec_slice(x, c(1, 3))
#> a c 
#> 1 3

For data frames, vec_slice() selects rows while preserving columns:

vec_slice(mtcars, 1:5)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  1  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  1  0    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

vec_recycle

vec_recycle() automatically recycles vectors to a common length, throwing an error on incompatible lengths:

vec_recycle(1:3, c(TRUE, FALSE))
#> [1] 1 2 3

# This throws an error
try(vec_recycle(1:3, 1:4))
#> Error: Can't recycle `1:3` (size 3) to size 4.

vec_cast

vec_cast() converts between types with explicit control over conversion rules:

# Successful cast
vec_cast(1:3, character())
#> [1] "1" "2" "3"

# Failed cast - throws error
try(vec_cast(c("a", "b"), integer()))
#> Error: Can't convert `c("a", "b")` <character> to <integer>.

The third argument controls what happens when conversion fails: x_bar (the default, throws error), x, or NA:

tryCatch(
  vec_cast("a", integer()),
  error = function(e) NA_integer_
)
#> [1] NA NA

The vctr class

The vctr class lets you create custom vector types that behave like built-in vectors. You define a class, then implement the required methods.

library(vctrs)

# Define a new vector type
new_percent <- function(x) {
  stopifnot(is.numeric(x), x >= 0, x <= 1)
  new_vctr(x, class = "percent")
}

# Print method
format.percent <- function(x, ...) {
  paste0(round(vec_data(x) * 100, 1), "%")
}

# Implement vec_cast for converting to/from percent
vec_cast.percent <- function(x, to, ...) {
  if (is.character(to)) {
    format(x)
  } else {
    stop_incompatible_cast(x, to, x_arg = "", to_arg = "")
  }
}

# Test it
p <- new_percent(c(0.25, 0.5, 0.75))
p
#> <percent[3]>
#> [1] 25% 50% 75%

vec_size(p)
#> [1] 3

The vctr class automatically gets length(), subsetting, and printing behaviors. You implement methods like format(), vec_cast(), and vec_math() to customize behavior.

Performance comparison

vctrs operations are designed for performance. They minimize type checking at runtime by enforcing type consistency during construction. Here’s a quick comparison:

# Base R - implicit recycling with warning
x <- 1:3
y <- 1:2
x + y
#> [1] 2 4 4
#> Warning message:
#>   In x + y : longer object length is not a multiple of the shorter

# vctrs - explicit recycling with error on mismatch
vec_recycle(x, y)
#> Error: Can't recycle `1:3` (size 3) to size 2.

The performance difference shows most clearly when working with data frames in pipelines. vctrs-powered functions like dplyr::mutate() avoid repeated type checks because they validate once, then operate efficiently:

library(dplyr)

# vctrs validates once, then operates efficiently
result <- mtcars %>%
  mutate(
    disp_l = disp / 61.0237,
    wt_kg = wt * 453.592
  ) %>%
  head(3)

# Compare to base R approach requiring manual checks
base_result <- transform(mtcars,
  disp_l = disp / 61.0237,
  wt_kg = wt * 453.592
)
head(base_result, 3)

The tidyverse uses vctrs to ensure that type conversions happen at the right time—explicitly when you request them, not silently when you least expect it.

Practical examples

Validating input in a function

Use vctrs to validate and normalize function inputs:

normalize <- function(x) {
  x <- tryCatch(
    vec_cast(x, double()),
    error = function(e) NA_real_
  )
  x <- vec_recycle(x, vec_size_common(x, 1))
  x / sum(x, na.rm = TRUE)
}

normalize(c(1, 2, 3))
#> [1] 0.1666667 0.3333333 0.5000000

normalize(1)
#> [1] NaN  # because sum(1) = 1, and 1/1 = 1, wait...
# Actually: 1 / 1 = 1 (single element normalizes to itself)

Creating a date vector type

new_fiscal_year <- function(year, quarter) {
  stopifnot(
    is.integer(year),
    is.integer(quarter),
    quarter >= 1,
    quarter <= 4
  )
  new_vctr(
    vec_cbind(year = year, quarter = quarter),
    class = "fiscal_year"
  )
}

fy <- new_fiscal_year(2024L, 1:4)
fy
#> <fiscal_year[4]>
#>      year quarter
#> [1,] 2024       1
#> [2,] 2024       2
#> [3,] 2024       3
#> [4,] 2024       4

Defining invariants

vctrs classes should define their invariants explicitly. A ratio vector that wraps doubles must ensure values stay in [0, 1]. Override vec_arith() to either enforce this or throw when arithmetic would violate it. This makes the type self-documenting: users cannot accidentally create invalid states through standard operations.

Casting and coercion

vec_cast() converts a vector from one type to another, throwing if the conversion is lossy. vec_ptype2() computes the common type for a pair of types, the type that both can be safely converted to for combining. Implementing these methods for a custom type determines how it behaves in c(), dplyr::bind_rows(), and comparisons. Without implementations, vctrs falls back to base R behavior, which may coerce unexpectedly.

Integration with the tidyverse

Any vector that implements vctrs methods integrates automatically with dplyr and tidyr operations. mutate(), filter(), and joins use vec_ptype2() and vec_cast() internally. This means a well-implemented vctrs type works correctly in all tidyverse contexts without any dplyr-specific code. The pillar package uses vctrs metadata to format custom types correctly in tibble output.

Proxy and restoration

For performance-critical vctrs types, implement the vec_proxy() and vec_restore() pair. vec_proxy() converts the vector to a simple representation (like a plain double) before slicing, combining, or comparing; vec_restore() reconstructs the custom type from the proxy after the operation. This avoids the overhead of running method dispatch for every element-level operation, which matters for long vectors in tight loops.

Type coercion rules

vec_ptype_common(x, y) returns the common type of two inputs without converting them. vec_cast(x, to = double()) performs the actual conversion. vec_cast(1L, double()) succeeds. vec_cast("1", double()) errors by default, call vec_cast("1", double(), x_arg = "x") to get an informative error message.

The coercion hierarchy: logical < integer < double < complex < character. Any vector can be cast to a higher type; casting to a lower type errors unless the values are representable (e.g., vec_cast(1.0, integer()) succeeds, but vec_cast(1.5, integer()) errors).

new_vctr(x, class = "my_class") creates a new vector class. The class inherits from vctrs_vctr, which provides default implementations of arithmetic, comparison, formatting, and combining methods. Override these methods to customize behavior.

Record and list-of vectors

new_rcrd(list(x = 1:3, y = c(1.1, 2.2, 3.3))) creates a record vector — a vector where each element has multiple fields. This models types like complex numbers or date-times with sub-second precision. field(rec, "x") extracts a field.

list_of(.x = list(1:3, 4:6), .ptype = integer()) creates a typed list where every element must be a compatible integer vector. This enforces type consistency for list-columns, preventing the common problem of list-columns containing heterogeneous types.

vctrs in package development

Packages that define custom data structures (units, currencies, geometric types) should be built on vctrs to inherit consistent tidyverse behavior. Operations like dplyr::bind_rows(), dplyr::mutate(), and tidyr::pivot_longer() use vctrs internally. Custom classes built on vctrs work correctly in these contexts without special-casing each operation.

rlang::is_list_of(x, ptype) and vctrs::vec_assert(x, ptype = double()) provide assertion functions that work with the vctrs type system. Use these in function argument validation instead of stopifnot(is.numeric(x)) for clearer error messages and better tidyverse integration.

What vctrs provides

vctrs is the package underlying the tidyverse’s type system. It defines how vectors combine — when you bind rows of two data frames with different column types, what type does the combined column have? Without vctrs, this was determined by base R’s coercion rules, which are inconsistent across operations. vctrs establishes consistent coercion rules and provides the infrastructure for building custom vector classes that participate in these rules.

For package authors, vctrs provides the building blocks for creating vector classes that behave predictably in tidyverse operations. A custom class that extends vec_ptype2 and vec_cast participates in type coercion correctly — it knows how to combine with other types and what types it can be converted to. Without this, custom classes break unexpectedly when used in dplyr operations or when combined with standard vectors.

Building custom vector classes

A custom vctrs class wraps a base vector and attaches attributes or constraints. The vec_ptype_abbr method returns the short type name displayed in tibble printing. The format method controls how individual values are displayed. The vec_arith and vec_math methods define arithmetic behavior. Together these methods make the custom class behave correctly in tidyverse operations.

The rcrd (record) type is useful for structured values where multiple components belong together. A date range is one logical value with start and end components. A geographic coordinate has latitude and longitude. Storing these as separate columns loses the semantic unity; storing as a single rcrd column preserves it while remaining compatible with dplyr’s column operations.

See also