String Manipulation with stringr
stringr is part of the tidyverse and provides a consistent interface for string operations. If you’ve ever struggled with R’s base string functions — paste(), substr(), grep(), gsub() — stringr makes them easier to remember and use.
This guide covers the most useful stringr functions for everyday data work.
Installing stringr
install.packages("stringr")
library(stringr)
Or load the entire tidyverse:
library(tidyverse)
Creating Strings
str_c(): Combining Strings
str_c() combines strings together. It handles missing values gracefully.
str_c("Hello", " ", "World")
# [1] "Hello World"
str_c("x", 1:3, sep = "_")
# [1] "x_1" "x_2" "x_3"
str_flatten() collapses a vector into a single string:
str_flatten(c("a", "b", "c"), collapse = ", ")
# [1] "a, b, c"
str_repeat(): Repeating Strings
str_repeat("ha", 3)
# [1] "hahaha"
String Length
str_length(): Counting Characters
str_length(c("apple", "banana", "cherry"))
# [1] 5 6 6
This counts the actual characters, not bytes. For strings with non-ASCII characters, this matters.
Subsetting Strings
str_sub(): Extracting Parts
str_sub() extracts a substring by position.
x <- "abcdef"
str_sub(x, 1, 3)
# [1] "abc"
str_sub(x, -3, -1)
# [1] "def"
You can also use it to replace parts of a string:
x <- "apple"
str_sub(x, 1, 1) <- "A"
x
# [1] "Apple"
str_extract(): Pattern Extraction
str_extract() pulls out the first match to a pattern:
str_extract("The price is $50.00", "\\d+\\.\\d+")
# [1] "50.00"
str_extract_all() returns all matches:
str_extract_all("abc123def456", "\\d+")
# [[1]]
# [1] "123" "456"
Pattern Detection
str_detect(): Finding Patterns
str_detect() returns TRUE if a pattern exists in a string:
fruits <- c("apple", "banana", "cherry", "apricot")
str_detect(fruits, "^a")
# [1] TRUE FALSE FALSE TRUE
This is useful with sum() to count matches or with filter() in dplyr:
# Count strings starting with 'a'
sum(str_detect(fruits, "^a"))
# [1] 2
str_starts() and str_ends()
Check if strings start or end with a pattern:
str_starts(fruits, "a")
# [1] TRUE FALSE FALSE TRUE
str_ends(fruits, "e")
# [1] TRUE FALSE TRUE FALSE
String Replacement
str_replace(): Substituting Patterns
str_replace() replaces the first match:
str_replace("apple pie", "pie", "tart")
# [1] "apple tart"
str_replace_all() replaces all matches:
str_replace_all("aaa", "a", "b")
# [1] "bbb"
Use str_remove() as shorthand for replacing with nothing:
str_remove_all("a-b-c-d", "-")
# [1] "abcd"
Splitting Strings
str_split(): Breaking Apart
str_split() splits a string into pieces:
str_split("a,b,c", ",")
# [[1]]
# [1] "a" "b" "c"
Add simplify = TRUE to get a matrix:
str_split("a,b,c", ",", simplify = TRUE)
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
Use str_glue() and str_glue_data() for string interpolation:
name <- "Alice"
age <- 30
str_glue("My name is {name} and I am {age} years old.")
# My name is Alice and I am 30 years old.
Whitespace Handling
str_trim(): Removing Extra Spaces
str_trim(" hello ")
# [1] "hello"
str_squish(" hello world ")
# [1] "hello world"
str_pad(): Adding Padding
str_pad("apple", width = 10, side = "left", pad = " ")
# [1] " apple"
str_pad("5", width = 2, pad = "0")
# [1] "05"
Case Manipulation
str_to_upper(), str_to_lower(), str_to_title()
str_to_upper("Hello World")
# [1] "HELLO WORLD"
str_to_lower("Hello World")
# [1] "hello world"
str_to_title("hello world")
# [1] "Hello World"
Sorting Strings
str_order() and str_sort()
x <- c("banana", "Apple", "cherry")
str_sort(x)
# [1] "Apple" "banana" "cherry"
str_sort(x, locale = "en")
# [1] "Apple" "banana" "cherry"
The locale argument matters for non-English characters.
Common Patterns
Email Extraction
emails <- c("john@email.com", "jane.doe@company.org", "invalid")
str_extract(emails, "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")
# [1] "john@email.com" "jane.doe@company.org" NA
Phone Number Formatting
phone <- "5551234567"
str_replace(phone, "(\\d{3})(\\d{3})(\\d{4})", "(\\1) \\2-\\3")
# [1] "(555) 123-4567"
Extracting Numbers from Text
text <- "The temperature is 25 degrees"
str_extract(text, "-?\\d+")
# [1] "25"
When to Use stringr
stringr is ideal for most string manipulation tasks. The function names are intuitive: str_ prefix, then a verb (detect, extract, replace, split, etc.).
For very large text data, you might consider stringi, which stringr is built on. For regex-heavy operations, the pattern syntax is the same.
Summary
| Function | Purpose |
|---|---|
str_c() | Combine strings |
str_length() | Count characters |
str_sub() | Extract by position |
str_extract() | Extract by pattern |
str_detect() | Check if pattern exists |
str_replace() | Substitute patterns |
str_split() | Split into pieces |
str_trim() | Remove whitespace |
str_to_upper() | Change case |
Master these functions, and you’ll handle the vast majority of string manipulation tasks in R.