String Manipulation with stringr
String manipulation is a fundamental skill for any R programmer. Whether you’re cleaning messy data, parsing text files, or building features from textual data, the stringr package provides a consistent, readable interface for all string operations. Part of the Tidyverse, stringr wraps base R’s inconsistent string functions with a unified API.
Installing and Loading stringr
If you’ve installed the Tidyverse, stringr is already available. Otherwise, install it separately:
install.packages("stringr")
library(stringr)
All stringr functions start with str_, making them easy to discover with autocompletion.
Basic String Operations
Measuring String Length
The str_length() function returns the number of characters in each string:
words <- c("hello", "world", "R programming")
str_length(words)
# [1] 5 5 14
This is equivalent to nchar() but handles NA more gracefully by default.
Extracting Substrings
Use str_sub() to extract or replace portions of a string:
text <- "R programming is fun"
# Extract characters from position 1 to 3
str_sub(text, 1, 3)
# [1] "R p"
# Replace substring in place
str_sub(text, 1, 3) <- "Py"
text
# [1] "Py programming is fun"
Combining Strings
The str_c() function concatenates strings, handling vectors elegantly:
first_name <- "John"
last_name <- "Doe"
str_c(first_name, " ", last_name)
# [1] "John Doe"
# For vectors, use sep and collapse
str_c(c("a", "b", "c"), 1:3, sep = "-")
# [1] "a-1" "b-2" "c-3"
str_c(c("a", "b", "c"), collapse = ", ")
# [1] "a, b, c"
Pattern Matching
Detecting Patterns
str_detect() returns TRUE where a pattern exists:
fruits <- c("apple", "banana", "cherry", "date")
# Find strings containing 'a'
str_detect(fruits, "a")
# [1] TRUE TRUE TRUE FALSE
Extracting Matches
Extract the actual matched text with str_extract():
emails <- c("user@domain.com", "test@example.org", "invalid")
str_extract(emails, "@\\w+\\.\\w+")
# [1] "@domain.com" "@example.org" NA
Replacing Patterns
Replace matched patterns with str_replace():
text <- "The cat sat on the mat"
str_replace(text, "cat", "dog")
# [1] "The dog sat on the mat"
# Replace all matches
str_replace_all(text, "at", "ot")
# [1] "The cot sot on the mot"
Splitting Strings
Split strings into vectors with str_split():
sentence <- "one,two,three,four"
str_split(sentence, ",")
# [[1]]
# [1] "one" "two" "three" "four"
Working with Whitespace
Trimming Whitespace
Remove leading and trailing whitespace with str_trim():
messy <- " clean this "
str_trim(messy)
# [1] "clean this"
Squishing Whitespace
Collapse multiple whitespace characters into single spaces with str_squish():
ugly <- "too many spaces"
str_squish(ugly)
# [1] "too many spaces"
Padding Strings
Add padding to strings with str_pad() for consistent widths:
numbers <- c("1", "25", "300")
str_pad(numbers, width = 5, side = "left", pad = "0")
# [1] "00001" "00025" "00300"
Practical Examples
Validating Email Addresses
Combine stringr functions for data validation:
is_valid_email <- function(email) {
pattern <- "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$"
str_detect(email, pattern)
}
emails <- c("user@domain.com", "invalid@", "test@site.org")
is_valid_email(emails)
# [1] TRUE FALSE TRUE
Extracting Numbers from Text
Pull numeric values from mixed text:
prices <- c("$19.99", "$25.50", "$9.99")
# Remove $ sign, convert to numeric
as.numeric(str_replace_all(prices, "\\$", ""))
# [1] 19.99 25.50 9.99
Cleaning Names
Standardize names with consistent formatting:
names <- c(" john doe ", "JANE SMITH", "Alice Bob")
names |>
str_squish() |>
str_to_title()
# [1] "John Doe" "Jane Smith" "Alice Bob"
Summary
The stringr package provides consistent, readable functions for string manipulation:
| Function | Purpose |
|---|---|
str_length() | Count characters |
str_sub() | Extract/replace substrings |
str_c() | Concatenate strings |
str_detect() | Find pattern matches |
str_extract() | Pull matched text |
str_replace() | Substitute patterns |
str_trim() | Remove outer whitespace |
str_squish() | Collapse internal whitespace |
str_pad() | Add padding characters |
These functions form the foundation for text processing in R. Combined with regular expressions, stringr handles virtually any string manipulation task you’ll encounter.