str_sub
Overview
str_sub() extracts a substring from a character vector using inclusive start and end positions. It is part of the stringr package (bundled with the tidyverse). The function is a thin wrapper around stringi::stri_sub() with a simpler interface.
Both start and end are inclusive (matching substr() in base R, not Python slice semantics). Negative indices count backwards from the end of the string.
Signature
str_sub(string, start = 1L, end = -1L)
str_sub(string, start = 1L, end = -1L, omit_na = FALSE) <- value
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
string | character | — | Input character vector |
start | integer | 1L | Start position. Positive counts from start, negative from end |
end | integer | -1L | End position (inclusive). Positive counts from start, negative from end |
Return Value
A character vector the same length as string. Returns NA when string is NA. Returns an empty string "" when positions are invalid (end comes before start, or start exceeds the string length).
Positive and Negative Indices
Positive indices count from the left (first character = 1). Negative indices count from the right (-1 = last character, -2 = second-to-last).
x <- "abcdef"
str_sub(x, 1, 3)
#> [1] "abc"
str_sub(x, -3, -1)
#> [1] "def"
str_sub(x, 1, -4)
#> [1] "ab"
Both start and end are inclusive. This matches base R’s substr() but differs from Python’s slice notation.
Out-of-Bounds Behaviour
str_sub() silently clips out-of-range values to the string boundaries. It does not throw an error.
x <- "abc"
str_sub(x, 1, 100)
#> [1] "abc"
str_sub(x, -100, -1)
#> [1] "abc"
str_sub(x, 5, 3)
#> [1] ""
Assignment Variant
Use str_sub(x, start, end) <- value to replace a substring in place. The replacement can be shorter, the same length, or longer than the extracted portion. The omit_na argument (default FALSE) controls whether NA values in string are skipped.
x <- "ABCDEF"
str_sub(x, 1, 3) <- "X"
x
#> [1] "XDEF"
str_sub(x, -1, -1) <- "K"
x
#> [1] "XDEK"
# Replacement can extend the string
str_sub(x, 2, 2) <- "GHIJ"
x
#> [1] "XGHIJDEK"
NA Handling
Input NA (logical missing) produces NA in the output. The string "NA" is treated as a regular value.
str_sub(c("foo", NA, "bar"), 1, 2)
#> [1] "fo" NA "ba"
Vectorisation
str_sub() is fully vectorised over string, start, and end. Pass a matrix from str_locate_all() to extract multiple segments in one call.
hw <- "Hadley Wickham"
str_sub(hw, c(1, 8), c(6, 14))
#> [1] "Hadley" "Wickham"
pos <- str_locate_all(hw, "[aeio]")[[1]]
str_sub(hw, pos)
#> [1] "a" "e" "i" "a"
See Also
str_sub<-— assignment variantstr_locate_all()— find all pattern positionssubstr()— older interface, identical semantics