rguides

gsub()

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE)

gsub() is a base R function for finding and replacing text patterns in character strings. It replaces all occurrences of a pattern. Both support regular expressions, fixed matching, and case-insensitive options.

Syntax

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE)

Parameters

ParameterTypeDefaultDescription
patterncharacterA pattern to search for (regex, literal string, or fixed if fixed = TRUE)
replacementcharacterThe replacement string. Use backreferences (\1, \2, etc.) with capture groups in regex mode
xcharacterA character vector where patterns will be searched
ignore.caselogicalFALSEIf TRUE, the search is case-insensitive
perllogicalFALSEIf TRUE, use Perl-compatible regular expressions
fixedlogicalFALSEIf TRUE, treat pattern as a literal string rather than regex

Examples

Basic replacement

text <- "The cat sat on the mat"
gsub("cat", "dog", text)
# [1] "The dog sat on the mat"

phone <- "Call me at 555-123-4567"
gsub("[0-9]", "", phone)
# [1] "Call me at ---"

filename <- "my report final.pdf"
gsub(" ", "_", filename)
# [1] "my_report_final.pdf"

Case-insensitive matching

Passing ignore.case = TRUE makes gsub() treat uppercase and lowercase letters as equivalent during matching. Every "r" and "R" in the input is replaced, regardless of case. This flag works identically across gsub(), sub(), grep(), and grepl(), so once you learn it for one function you can apply the same pattern everywhere.

text <- "R is GREAT and r is great"
gsub("r", "X", text, ignore.case = TRUE)
# [1] "X is GREAT and X is gXeat"

Using backreferences

Capture groups let you rearrange matched text in the replacement. Parentheses in the pattern capture sub-matches, and \\1, \\2 etc. in the replacement string refer back to them. The name-reversal pattern "(\\w+) (\\w+)""\\2, \\1" is the canonical example — it swaps first and last names in one call. Phone number reformatting works the same way, capturing each digit group and reassembling them with the desired separators.

names <- c("John Doe", "Jane Smith", "Bob Wilson")
gsub("(\\w+) (\\w+)", "\\2, \\1", names)
# [1] "Doe, John"      "Smith, Jane"   "Wilson, Bob"

phones <- c("555-123-4567", "987-654-3210")
gsub("([0-9]{3})-([0-9]{3})-([0-9]{4})", "(\\1) \\2-\\3", phones)
# [1] "(555) 123-4567" "(987) 654-3210"

Fixed matching (literal strings)

When the pattern contains regex metacharacters like . or * that should be taken literally, pass fixed = TRUE. Without this flag, gsub(".old", ".new", filename) would treat . as “any character” and produce unexpected results. Fixed matching is also faster because it skips regex compilation — always prefer it when searching for a known literal substring.

filename <- "report.txt.old"
gsub(".old", ".new", filename, fixed = TRUE)
# [1] "report.txt.new"

Common patterns

Beyond single replacements, gsub() handles two of the most common text-cleaning tasks: normalizing whitespace and stripping markup. These patterns appear in nearly every data-import pipeline and are worth memorizing.

Text cleaning

Collapsing multiple spaces with gsub("\\s+", " ", x) and removing HTML tags with gsub("<[^>]+>", "", x) are the two workhorses of text preprocessing. The whitespace pattern replaces any run of one or more whitespace characters with a single space, while the tag-stripping pattern removes everything between angle brackets including the brackets themselves.

messy <- "  hello   world  "
gsub("\\s+", " ", messy)
# [1] " hello world "

html <- "<p>Hello <b>world</b></p>"
gsub("<[^>]+>", "", html)
# [1] "Hello world"

gsub() vs sub() and performance notes

gsub() replaces every match in each string. sub() replaces only the first match per string. Choose based on whether you expect and want multiple replacements: for removing all HTML tags from a string, use gsub(); for replacing only the first occurrence of a delimiter, use sub().

For fixed-string replacements (no regex metacharacters), pass fixed = TRUE. This avoids regex compilation and is significantly faster on large character vectors. Always use fixed = TRUE when the pattern contains characters like ., *, (, ) that have special meaning in regex but you want treated as literals.

Back-references in the replacement string let you reuse captured groups: gsub("(\\w+) (\\w+)", "\\2 \\1", "hello world") returns "world hello". This is useful for reordering fields in structured text.

The stringr equivalent is str_replace_all(x, pattern, replacement). It uses PCRE and has consistent NA handling, making it preferable in tidyverse pipelines.

A useful feature of gsub() is that it can receive a named character vector as the replacement to perform multiple substitutions at once via chartr(), but the standard approach is chaining calls: gsub("pattern1", "replacement1", gsub("pattern2", "replacement2", x)). For many simultaneous substitutions, consider building a lookup table and using stringr::str_replace_all(x, lookup_vector) which accepts a named vector and applies all replacements in a single pass.

gsub() applies the replacement pattern globally — it replaces every non-overlapping match in each string. Matches are found left to right and do not overlap: gsub("aa", "b", "aaaa") returns "bb", not "ba" or "ab", because after replacing the first aa, the engine advances past the replaced text.

When the replacement string itself contains \1 backreferences, the referenced groups must exist in the pattern. Mismatched groups (referencing \2 when there is only one capture group) silently produce an empty replacement. Test gsub() patterns with regmatches() or gregexpr() to verify what is being matched before committing to a replacement operation on real data.

See also