str_sub
Overview
str_sub() extracts a substring from a character vector using inclusive start and end positions. It is part of the stringr package (bundled with the tidyverse). The function is a thin wrapper around stringi::stri_sub() with a simpler interface.
Both start and end are inclusive (matching substr() in base R, not Python slice semantics). Negative indices count backwards from the end of the string.
Signature
str_sub(string, start = 1L, end = -1L)
str_sub(string, start = 1L, end = -1L, omit_na = FALSE) <- value
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
string | character | , | Input character vector |
start | integer | 1L | Start position. Positive counts from start, negative from end |
end | integer | -1L | End position (inclusive). Positive counts from start, negative from end |
Return value
A character vector the same length as string. Returns NA when string is NA. Returns an empty string "" when positions are invalid (end comes before start, or start exceeds the string length).
Positive and negative indices
Positive indices count from the left (first character = 1). Negative indices count from the right (-1 = last character, -2 = second-to-last).
x <- "abcdef"
str_sub(x, 1, 3)
#> [1] "abc"
str_sub(x, -3, -1)
#> [1] "def"
str_sub(x, 1, -4)
#> [1] "ab"
Both start and end are inclusive. This matches base R’s substr() but differs from Python’s slice notation where the end index is exclusive. Understanding this difference is important when translating substring logic between R and other languages.
Out-of-Bounds behaviour
str_sub() silently clips out-of-range values to the string boundaries. It does not throw an error.
x <- "abc"
str_sub(x, 1, 100)
#> [1] "abc"
str_sub(x, -100, -1)
#> [1] "abc"
str_sub(x, 5, 3)
#> [1] ""
Assignment variant
Use str_sub(x, start, end) <- value to replace a substring in place. The replacement can be shorter, the same length, or longer than the extracted portion. The omit_na argument (default FALSE) controls whether NA values in string are skipped.
x <- "ABCDEF"
str_sub(x, 1, 3) <- "X"
x
#> [1] "XDEF"
str_sub(x, -1, -1) <- "K"
x
#> [1] "XDEK"
# Replacement can extend the string
str_sub(x, 2, 2) <- "GHIJ"
x
#> [1] "XGHIJDEK"
The assignment form str_sub()<- modifies the original character vector in place, making it useful when you need to selectively update portions of strings without rebuilding the entire vector. The replacement text can exceed the length of the extracted portion and the string grows to accommodate it automatically.
NA handling
Input NA (logical missing) produces NA in the output. The string "NA" is treated as a regular value.
str_sub(c("foo", NA, "bar"), 1, 2)
#> [1] "fo" NA "ba"
When NA appears as an input element, str_sub() preserves it in the corresponding output position rather than converting it to a character string. This NA-preserving behaviour is consistent across all stringr functions and simplifies downstream handling of missing data.
Vectorisation
str_sub() is fully vectorised over string, start, and end. Pass a matrix from str_locate_all() to extract multiple segments in one call.
hw <- "Hadley Wickham"
str_sub(hw, c(1, 8), c(6, 14))
#> [1] "Hadley" "Wickham"
pos <- str_locate_all(hw, "[aeio]")[[1]]
str_sub(hw, pos)
#> [1] "a" "e" "i" "a"
str_sub() is assignment-capable: str_sub(x, 2, 3) <- "XX" replaces characters 2 and 3 in-place. Negative indices count from the end: str_sub(x, -3, -1) extracts the last three characters. If start > end, str_sub() returns an empty string rather than raising an error, making it safe for variable-length inputs.
See also
- stringr::str_extract(), Extract matching patterns from strings
- stringr::str_replace(), Replace matched patterns in strings
- stringr::str_length(), Get the character length of strings