rguides

str_sub

Overview

str_sub() extracts a substring from a character vector using inclusive start and end positions. It is part of the stringr package (bundled with the tidyverse). The function is a thin wrapper around stringi::stri_sub() with a simpler interface.

Both start and end are inclusive (matching substr() in base R, not Python slice semantics). Negative indices count backwards from the end of the string.

Signature

str_sub(string, start = 1L, end = -1L)
str_sub(string, start = 1L, end = -1L, omit_na = FALSE) <- value

Parameters

ParameterTypeDefaultDescription
stringcharacterInput character vector
startinteger1LStart position. Positive counts from start, negative from end
endinteger-1LEnd position (inclusive). Positive counts from start, negative from end

Return Value

A character vector the same length as string. Returns NA when string is NA. Returns an empty string "" when positions are invalid (end comes before start, or start exceeds the string length).

Positive and Negative Indices

Positive indices count from the left (first character = 1). Negative indices count from the right (-1 = last character, -2 = second-to-last).

x <- "abcdef"

str_sub(x, 1, 3)
#> [1] "abc"

str_sub(x, -3, -1)
#> [1] "def"

str_sub(x, 1, -4)
#> [1] "ab"

Both start and end are inclusive. This matches base R’s substr() but differs from Python’s slice notation.

Out-of-Bounds Behaviour

str_sub() silently clips out-of-range values to the string boundaries. It does not throw an error.

x <- "abc"

str_sub(x, 1, 100)
#> [1] "abc"

str_sub(x, -100, -1)
#> [1] "abc"

str_sub(x, 5, 3)
#> [1] ""

Assignment Variant

Use str_sub(x, start, end) <- value to replace a substring in place. The replacement can be shorter, the same length, or longer than the extracted portion. The omit_na argument (default FALSE) controls whether NA values in string are skipped.

x <- "ABCDEF"

str_sub(x, 1, 3) <- "X"
x
#> [1] "XDEF"

str_sub(x, -1, -1) <- "K"
x
#> [1] "XDEK"

# Replacement can extend the string
str_sub(x, 2, 2) <- "GHIJ"
x
#> [1] "XGHIJDEK"

NA Handling

Input NA (logical missing) produces NA in the output. The string "NA" is treated as a regular value.

str_sub(c("foo", NA, "bar"), 1, 2)
#> [1] "fo" NA  "ba"

Vectorisation

str_sub() is fully vectorised over string, start, and end. Pass a matrix from str_locate_all() to extract multiple segments in one call.

hw <- "Hadley Wickham"

str_sub(hw, c(1, 8), c(6, 14))
#> [1] "Hadley"  "Wickham"

pos <- str_locate_all(hw, "[aeio]")[[1]]
str_sub(hw, pos)
#> [1] "a" "e" "i" "a"

See Also