grep()
grep(pattern, x, value = FALSE, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) grep() searches for matches to a pattern in a character vector and returns the indices of elements that match.
Syntax
grep(pattern, x, value = FALSE, ignore.case = FALSE,
perl = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| pattern | character | , | A regular expression pattern to match |
| x | character | , | A character vector to search in |
| value | logical | FALSE | If TRUE, return the matching strings instead of indices |
| ignore.case | logical | FALSE | If TRUE, ignore case when matching |
| perl | logical | FALSE | If TRUE, use Perl-compatible regular expressions |
| fixed | logical | FALSE | If TRUE, treat pattern as a literal string (faster) |
| useBytes | logical | FALSE | If TRUE, match byte-by-byte rather than character-by-character |
| invert | logical | FALSE | If TRUE, return indices of elements that do NOT match |
Examples
Basic usage
fruits <- c("apple", "banana", "cherry", "date", "elderberry")
grep("a", fruits)
# [1] 1 2 4
Using value = TRUE
Returning indices is useful for positional operations, but returning the matched strings directly is often more readable. When value = TRUE, grep() returns the actual matching elements instead of their positions — equivalent to x[grepl(pattern, x)] but as a single call. This is the right choice when you want to see which strings matched, not where they sit in the vector.
grep("a", fruits, value = TRUE)
# [1] "apple" "banana" "date"
Case-insensitive matching
Setting ignore.case = TRUE makes the pattern match regardless of letter case — "red" matches "Red" and "GREEN" equally. This is essential when searching user-supplied text or data from multiple sources where capitalization is inconsistent. Without this flag, grep("red", c("Red", "GREEN")) returns nothing, which is rarely the desired behavior in exploratory analysis.
colors <- c("Red", "GREEN", "blue", "YELLOW")
grep("red", colors, ignore.case = TRUE, value = TRUE)
# [1] "Red" "GREEN"
Fixed string matching (faster)
When the pattern is a literal string with no regex metacharacters, fixed = TRUE skips regex compilation entirely. This is noticeably faster on large character vectors and also prevents accidental regex interpretation — searching for "@" or "." with the default regex engine would match differently than expected. Always use fixed = TRUE when searching for user-supplied strings that might contain special characters.
emails <- c("user@example.com", "test@domain.org", "invalid")
grep("@", emails, fixed = TRUE, value = TRUE)
# [1] "user@example.com" "test@domain.org"
Invert to find non-matches
The invert = TRUE argument flips the result — instead of returning elements that match, it returns those that do not. This is useful for filtering out unwanted entries, such as removing rows with missing values from a data frame or excluding test records from analysis. Combined with value = TRUE, it returns the non-matching strings directly rather than their indices.
grep("a", fruits, invert = TRUE, value = TRUE)
# [1] "cherry" "elderberry"
Common patterns
The examples above cover grep() in isolation. In practice, you combine it with other R functions to solve real data problems. These three patterns handle the most frequent grep-related tasks in data analysis.
Counting matches
Getting a count of matching elements is as simple as wrapping the result in length(). This pattern answers “how many rows contain this pattern?” and is commonly used in data quality checks — counting missing-value indicators, malformed entries, or records that match a validation rule.
length(grep("pattern", strings))
Filtering data frames
Feeding grep() output into the row index of a data frame subsets rows based on a text pattern: df[grep("pattern", df$column), ]. This is base R’s equivalent of dplyr::filter(df, grepl("pattern", column)). The index-based approach works on matrices and data frames alike and does not require any additional packages.
df[grep("pattern", df$column), ]
Multiple patterns (OR logic)
Use the regex alternation operator | to match any of several patterns in a single grep() call. The pattern "apple|orange" matches strings containing either word. This is cleaner than chaining multiple grep() calls with union() and runs as a single regex pass. For a long list of alternatives, build the pattern programmatically with paste(terms, collapse = "|").
grep("apple|orange", fruits, value = TRUE)
grep() vs grepl(), which to use
grep() returns indices (integers) by default, or matched values when value = TRUE. grepl() returns a logical vector. Which to use depends on what you do next: if you need to subset a vector with [, grepl() gives you the mask directly; if you need the positions for further arithmetic or to pass to another function, grep() is more convenient.
grep(pattern, x, value = TRUE) is equivalent to x[grepl(pattern, x)] but written as a single call. Both are idiomatic; pick the one that reads more clearly for your use case.
Like all the base R regex functions, grep() accepts ignore.case = TRUE, fixed = TRUE (literal matching), and perl = TRUE (PCRE engine). For fixed-string searches in large vectors, fixed = TRUE provides a meaningful speed improvement.
When building data pipelines with dplyr, filter(df, grepl(pattern, col)) is the more idiomatic form. grep() is best in base R contexts where you need indices or matched values directly, such as finding which column names contain a keyword or locating rows in a plain matrix. Use grep(pattern, names(df), value = TRUE) to find column names that match a pattern, a common exploratory step on wide data frames.
# Find all measurement columns in a wide data frame
wide_df <- data.frame(id = 1:3, measure_a = 4:6, measure_b = 7:9, category = letters[1:3])
grep("^measure", names(wide_df), value = TRUE)
# [1] "measure_a" "measure_b"
grep() returns indices by default — the positions in x where the pattern matched. Use grep(pattern, x, value = TRUE) to return the matching strings themselves rather than their indices. This is frequently more useful for character vectors, but the index form is more useful when you need to subset a data frame by matching row positions: df[grep(pattern, df$col), ].
For logical membership testing, grepl() is simpler than grep() because it returns a logical vector of the same length as x, which composes naturally with subsetting and which(). Use grep() when you need indices or matched values; use grepl() when you need a logical mask. Both accept ignore.case = TRUE, fixed = TRUE, and perl = TRUE with identical behavior.
grep() returns indices of matching elements by default. Pass value = TRUE to return the matching strings themselves. grepl() returns a logical vector of the same length as the input — better for use as a filter. Both support perl = TRUE for PCRE regex syntax, which enables lookaheads, lookbehinds, and other advanced patterns not available in TRE.