substr()

substr(x, start, stop)

Returns character· Updated May 28, 2026· Base Functions

substrsubstringstringstextbase

substr() is a base R function that extracts or replaces substrings from a character vector using 1-indexed positions. Unlike substring(), which uses start and stop positions, substr() requires both boundaries to be explicitly specified.

Signature

substr(x, start, stop)
substr(x, start, stop) <- value

Parameters

Parameter	Type	Default	Description
`x`	character	,	A character vector
`start`	integer	—	Starting position (1-indexed)
`stop`	integer	—	Ending position (1-indexed)
`value`	character	—	Replacement string (for assignment)

Return value

Returns a character vector of the same length as x with the extracted or replaced substring.

Key differences: `substr()` vs `substring()`

1-indexed: Both use 1-based indexing (first character is position 1)
Parameters: substr(x, start, stop) vs substring(x, first, last = Inf)
substring() default: When last is omitted, substring() goes to the end of the string
substr() requirement: Both start and stop must be provided

Examples

Example 1: basic string extraction

# Extract characters from position 2 to 4
x <- "abcdefg"
substr(x, 2, 4)
#> [1] "bcd"

Extracting from position 2 to 4 returns the substring that starts at the second character and runs through the fourth, inclusive. Both start and stop are 1-indexed, meaning the first character is at position 1 — a convention consistent with the rest of base R. You can extract a single character by setting both arguments to the same position.

# First character
substr(x, 1, 1)
#> [1] "a"

Setting start equal to stop extracts a single character. This pattern is common when you need to pull one letter at a known index, such as the first letter of a code whose prefix indicates a category. For extracting the tail of a string, compute the stop position from the string length to ensure the extraction always reaches the end.

# Last three characters — use nchar() to find the end
substr(x, 5, nchar(x))
#> [1] "efg"

Using nchar(x) for the stop argument ensures the extraction always goes to the end regardless of string length. This is the safe pattern when string lengths vary across a vector — hardcoding a stop position that works for one element may miss characters in a longer element.

Example 2: out-of-Bounds behavior

# Start beyond string length returns empty string
x <- "hello"
substr(x, 10, 15)
#> [1] ""

When start exceeds the string length, substr() returns an empty string rather than throwing an error. This silent handling means you should validate indices if you expect non-empty results — checking that start <= nchar(x) before calling substr() prevents silent failures in data processing pipelines.

# Stop beyond string length is silently truncated to string end
substr(x, 2, 100)
#> [1] "ello"

A stop value beyond the string length is clamped to nchar(x), so substr(x, 2, 100) behaves identically to substr(x, 2, nchar(x)). This is convenient for extracting from a fixed position to the end without computing the length, but it means you cannot detect the discrepancy between the requested and actual extraction range.

# Start > Stop returns empty string
substr(x, 5, 2)
#> [1] ""

When start is greater than stop, the result is an empty string. This can happen accidentally when computing positions from data — for instance, if a derived start position overshoots a derived stop. Wrapping the call in a check that ensures start <= stop avoids silently empty results.

# Negative indices are treated as 1 (NOT as offsets from the end)
substr(x, -1, 3)
#> [1] "abc"

Negative start values are silently treated as 1, not as offsets from the end of the string. This contrasts with stringr::str_sub(), where negative indices count backward from the end. If you need end-relative positions, str_sub() is the safer choice — substr() treats any negative value as if it were 1 without warning.

Example 3: string replacement with assignment

# Replace substring on left-hand side of <-
x <- "abcdef"
substr(x, 2, 4) <- "NEW"
x
#> [1] "aNEWef"

The assignment form replaces characters at the specified positions with the replacement string. When the replacement is longer than the target range, the original string grows to accommodate it — substr() expands the string rather than truncating the replacement. This is one of the few base R operations that resizes a character vector element in place.

# Works with vectors — applies the same replacement to every element
words <- c("apple", "banana", "cherry")
substr(words, 1, 2) <- "XX"
words
#> [1] "XXple"   "XXnana"  "XXerry"

The assignment form vectorizes naturally over the character input, applying the same replacement to every element. This is useful for prefix or suffix replacements across an entire column of data — for instance, replacing the first two characters of all identifiers in a batch without writing an explicit loop.

# Replacement longer than substring length (fills from start, string grows)
x <- "abcdef"
substr(x, 2, 3) <- "MMMMM"
x
#> [1] "aMMMMMdef"

When the replacement string is longer than the substring range being replaced, the remaining characters shift right to make room. The result is a string that is longer than the original by the difference in lengths. This behavior is unique to substr() — most R string functions do not change string length during replacement.

# Replacement shorter than substring length (string shrinks)
x <- "abcdef"
substr(x, 2, 4) <- "X"
x
#> [1] "aXef"

A replacement shorter than the target range shrinks the string. Characters beyond the replacement are shifted left to fill the gap. The combined effect of these resize behaviors means substr()<- can both grow and shrink strings — a capability that str_replace() in stringr does not have, since str_replace() always preserves string length unless the replacement differs in length.

Example 4: comparison with `substring()`

# substr() requires both start and stop — both callers produce the same output
x <- "abcdef"
substr(x, 2, 6)
#> [1] "bcdef"

When both boundaries are provided, substr() and substring() produce identical results. The difference emerges when you omit the stop or last argument — substr() requires it, while substring() defaults to the end of the string. This is the primary design distinction between the two functions.

# substring() can use just start — goes to the end automatically
substring(x, 2)
#> [1] "bcdef"

Omitting the last argument in substring() extracts from first to the end. This is the most common reason to choose substring() over substr(): fewer keystrokes when you know you want the tail of a string without computing its length first.

# Both handle vectors the same way for equivalent calls
strings <- c("first", "second", "third")
substr(strings, 2, 4)
#> [1] "irs" "eco" "hir"
substring(strings, 2, 4)
#> [1] "irs" "eco" "hir"

Both functions are vectorized over the string argument, returning one result per element. This means you can extract the same position range from every string in a character vector without writing a loop — a key advantage over procedural approaches to string processing that would require iterating with for or lapply.

# Key difference: substring with one arg goes to end by default
substring(x, 3)
#> [1] "cdef"
substr(x, 3, nchar(x))  # equivalent
#> [1] "cdef"

Using nchar(x) as the stop argument in substr() replicates substring()’s default behavior. This is a useful pattern when you need to extract from a known starting position to the end but the string length varies across elements and you want to be explicit about the full range.

Common patterns

Character-by-character processing:

# Replace first character of each string with its uppercase version
names <- c("apple", "banana", "cherry")
substr(names, 1, 1) <- toupper(substr(names, 1, 1))
names
#> [1] "Apple"   "Banana"  "Cherry"

This pattern combines extraction and assignment in one line: extract the first character, uppercase it, then write it back to the same positions. The nesting works because substr() on the right side extracts and substr<- on the left side replaces — both operating on the same range. For larger text transformations, stringr::str_to_title() is a more direct alternative that handles edge cases like multi-word strings.

Masking sensitive data:

# Mask middle of credit card number — replace positions 9-20 with placeholder
card <- "1234-5678-9012-3456"
substr(card, 9, 20) <- "NNNN-NNNN-NNNN"
card
#> [1] "1234-5678-NNNN-NNNN"

Masking replaces a span of characters with placeholder text, keeping the surrounding structure intact. The replacement string does not need to match the length of the masked range — substr() adjusts the string size accordingly. For production use, consider the mask package or a dedicated data anonymization library that handles edge cases like variable-length identifiers and prevents accidental unmasking through index errors.

substr() vs str_sub()

str_sub() from the stringr package accepts negative indices that count from the end. str_sub(x, -3, -1) extracts the last three characters, which is not directly possible with substr() without computing the length first. For negative index support, str_sub() is more convenient. For code that must avoid non-base dependencies, substr() is sufficient for fixed positive positions. Both functions are vectorized over the string argument and the start/stop arguments, enabling extraction of different substrings from each element.