substr()
substr(x, start, stop) substr() is a base R function that extracts or replaces substrings from a character vector using 1-indexed positions. Unlike substring(), which uses start and stop positions, substr() requires both boundaries to be explicitly specified.
Signature
substr(x, start, stop)
substr(x, start, stop) <- value
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
x | character | , | A character vector |
start | integer | — | Starting position (1-indexed) |
stop | integer | — | Ending position (1-indexed) |
value | character | — | Replacement string (for assignment) |
Return value
Returns a character vector of the same length as x with the extracted or replaced substring.
Key differences: substr() vs substring()
- 1-indexed: Both use 1-based indexing (first character is position 1)
- Parameters:
substr(x, start, stop)vssubstring(x, first, last = Inf) substring()default: Whenlastis omitted,substring()goes to the end of the stringsubstr()requirement: Bothstartandstopmust be provided
Examples
Example 1: basic string extraction
# Extract characters from position 2 to 4
x <- "abcdefg"
substr(x, 2, 4)
#> [1] "bcd"
Extracting from position 2 to 4 returns the substring that starts at the second character and runs through the fourth, inclusive. Both start and stop are 1-indexed, meaning the first character is at position 1 — a convention consistent with the rest of base R. You can extract a single character by setting both arguments to the same position.
# First character
substr(x, 1, 1)
#> [1] "a"
Setting start equal to stop extracts a single character. This pattern is common when you need to pull one letter at a known index, such as the first letter of a code whose prefix indicates a category. For extracting the tail of a string, compute the stop position from the string length to ensure the extraction always reaches the end.
# Last three characters — use nchar() to find the end
substr(x, 5, nchar(x))
#> [1] "efg"
Using nchar(x) for the stop argument ensures the extraction always goes to the end regardless of string length. This is the safe pattern when string lengths vary across a vector — hardcoding a stop position that works for one element may miss characters in a longer element.
Example 2: out-of-Bounds behavior
# Start beyond string length returns empty string
x <- "hello"
substr(x, 10, 15)
#> [1] ""
When start exceeds the string length, substr() returns an empty string rather than throwing an error. This silent handling means you should validate indices if you expect non-empty results — checking that start <= nchar(x) before calling substr() prevents silent failures in data processing pipelines.
# Stop beyond string length is silently truncated to string end
substr(x, 2, 100)
#> [1] "ello"
A stop value beyond the string length is clamped to nchar(x), so substr(x, 2, 100) behaves identically to substr(x, 2, nchar(x)). This is convenient for extracting from a fixed position to the end without computing the length, but it means you cannot detect the discrepancy between the requested and actual extraction range.
# Start > Stop returns empty string
substr(x, 5, 2)
#> [1] ""
When start is greater than stop, the result is an empty string. This can happen accidentally when computing positions from data — for instance, if a derived start position overshoots a derived stop. Wrapping the call in a check that ensures start <= stop avoids silently empty results.
# Negative indices are treated as 1 (NOT as offsets from the end)
substr(x, -1, 3)
#> [1] "abc"
Negative start values are silently treated as 1, not as offsets from the end of the string. This contrasts with stringr::str_sub(), where negative indices count backward from the end. If you need end-relative positions, str_sub() is the safer choice — substr() treats any negative value as if it were 1 without warning.
Example 3: string replacement with assignment
# Replace substring on left-hand side of <-
x <- "abcdef"
substr(x, 2, 4) <- "NEW"
x
#> [1] "aNEWef"
The assignment form replaces characters at the specified positions with the replacement string. When the replacement is longer than the target range, the original string grows to accommodate it — substr() expands the string rather than truncating the replacement. This is one of the few base R operations that resizes a character vector element in place.
# Works with vectors — applies the same replacement to every element
words <- c("apple", "banana", "cherry")
substr(words, 1, 2) <- "XX"
words
#> [1] "XXple" "XXnana" "XXerry"
The assignment form vectorizes naturally over the character input, applying the same replacement to every element. This is useful for prefix or suffix replacements across an entire column of data — for instance, replacing the first two characters of all identifiers in a batch without writing an explicit loop.
# Replacement longer than substring length (fills from start, string grows)
x <- "abcdef"
substr(x, 2, 3) <- "MMMMM"
x
#> [1] "aMMMMMdef"
When the replacement string is longer than the substring range being replaced, the remaining characters shift right to make room. The result is a string that is longer than the original by the difference in lengths. This behavior is unique to substr() — most R string functions do not change string length during replacement.
# Replacement shorter than substring length (string shrinks)
x <- "abcdef"
substr(x, 2, 4) <- "X"
x
#> [1] "aXef"
A replacement shorter than the target range shrinks the string. Characters beyond the replacement are shifted left to fill the gap. The combined effect of these resize behaviors means substr()<- can both grow and shrink strings — a capability that str_replace() in stringr does not have, since str_replace() always preserves string length unless the replacement differs in length.
Example 4: comparison with substring()
# substr() requires both start and stop — both callers produce the same output
x <- "abcdef"
substr(x, 2, 6)
#> [1] "bcdef"
When both boundaries are provided, substr() and substring() produce identical results. The difference emerges when you omit the stop or last argument — substr() requires it, while substring() defaults to the end of the string. This is the primary design distinction between the two functions.
# substring() can use just start — goes to the end automatically
substring(x, 2)
#> [1] "bcdef"
Omitting the last argument in substring() extracts from first to the end. This is the most common reason to choose substring() over substr(): fewer keystrokes when you know you want the tail of a string without computing its length first.
# Both handle vectors the same way for equivalent calls
strings <- c("first", "second", "third")
substr(strings, 2, 4)
#> [1] "irs" "eco" "hir"
substring(strings, 2, 4)
#> [1] "irs" "eco" "hir"
Both functions are vectorized over the string argument, returning one result per element. This means you can extract the same position range from every string in a character vector without writing a loop — a key advantage over procedural approaches to string processing that would require iterating with for or lapply.
# Key difference: substring with one arg goes to end by default
substring(x, 3)
#> [1] "cdef"
substr(x, 3, nchar(x)) # equivalent
#> [1] "cdef"
Using nchar(x) as the stop argument in substr() replicates substring()’s default behavior. This is a useful pattern when you need to extract from a known starting position to the end but the string length varies across elements and you want to be explicit about the full range.
Common patterns
Character-by-character processing:
# Replace first character of each string with its uppercase version
names <- c("apple", "banana", "cherry")
substr(names, 1, 1) <- toupper(substr(names, 1, 1))
names
#> [1] "Apple" "Banana" "Cherry"
This pattern combines extraction and assignment in one line: extract the first character, uppercase it, then write it back to the same positions. The nesting works because substr() on the right side extracts and substr<- on the left side replaces — both operating on the same range. For larger text transformations, stringr::str_to_title() is a more direct alternative that handles edge cases like multi-word strings.
Masking sensitive data:
# Mask middle of credit card number — replace positions 9-20 with placeholder
card <- "1234-5678-9012-3456"
substr(card, 9, 20) <- "NNNN-NNNN-NNNN"
card
#> [1] "1234-5678-NNNN-NNNN"
Masking replaces a span of characters with placeholder text, keeping the surrounding structure intact. The replacement string does not need to match the length of the masked range — substr() adjusts the string size accordingly. For production use, consider the mask package or a dedicated data anonymization library that handles edge cases like variable-length identifiers and prevents accidental unmasking through index errors.
substr() vs str_sub()
str_sub() from the stringr package accepts negative indices that count from the end. str_sub(x, -3, -1) extracts the last three characters, which is not directly possible with substr() without computing the length first. For negative index support, str_sub() is more convenient. For code that must avoid non-base dependencies, substr() is sufficient for fixed positive positions. Both functions are vectorized over the string argument and the start/stop arguments, enabling extraction of different substrings from each element.
See also
- substring() — A generalization of
substr()that accepts vectors forfirstandlast, allowing different extraction positions per element. - nchar() — Count the number of characters in a string, used with
substr()when extracting to the end. - str_sub() — The stringr equivalent that supports negative indices for end-relative positions and is assignment-capable.
- grep() and regexpr() — Pattern-based extraction via
regmatches()andgregexpr()for when match positions vary by string content rather than fixed index.