rguides

grep()

grep(pattern, x, value = FALSE, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)

grep() searches for matches to a pattern in a character vector and returns the indices of elements that match.

Syntax

grep(pattern, x, value = FALSE, ignore.case = FALSE, 
     perl = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)

Parameters

ParameterTypeDefaultDescription
patterncharacter,A regular expression pattern to match
xcharacter,A character vector to search in
valuelogicalFALSEIf TRUE, return the matching strings instead of indices
ignore.caselogicalFALSEIf TRUE, ignore case when matching
perllogicalFALSEIf TRUE, use Perl-compatible regular expressions
fixedlogicalFALSEIf TRUE, treat pattern as a literal string (faster)
useByteslogicalFALSEIf TRUE, match byte-by-byte rather than character-by-character
invertlogicalFALSEIf TRUE, return indices of elements that do NOT match

Examples

Basic usage

fruits <- c("apple", "banana", "cherry", "date", "elderberry")

grep("a", fruits)
# [1] 1 2 4

Using value = TRUE

Returning indices is useful for positional operations, but returning the matched strings directly is often more readable. When value = TRUE, grep() returns the actual matching elements instead of their positions — equivalent to x[grepl(pattern, x)] but as a single call. This is the right choice when you want to see which strings matched, not where they sit in the vector.

grep("a", fruits, value = TRUE)
# [1] "apple"    "banana"   "date"

Case-insensitive matching

Setting ignore.case = TRUE makes the pattern match regardless of letter case — "red" matches "Red" and "GREEN" equally. This is essential when searching user-supplied text or data from multiple sources where capitalization is inconsistent. Without this flag, grep("red", c("Red", "GREEN")) returns nothing, which is rarely the desired behavior in exploratory analysis.

colors <- c("Red", "GREEN", "blue", "YELLOW")

grep("red", colors, ignore.case = TRUE, value = TRUE)
# [1] "Red"   "GREEN"

Fixed string matching (faster)

When the pattern is a literal string with no regex metacharacters, fixed = TRUE skips regex compilation entirely. This is noticeably faster on large character vectors and also prevents accidental regex interpretation — searching for "@" or "." with the default regex engine would match differently than expected. Always use fixed = TRUE when searching for user-supplied strings that might contain special characters.

emails <- c("user@example.com", "test@domain.org", "invalid")

grep("@", emails, fixed = TRUE, value = TRUE)
# [1] "user@example.com" "test@domain.org"

Invert to find non-matches

The invert = TRUE argument flips the result — instead of returning elements that match, it returns those that do not. This is useful for filtering out unwanted entries, such as removing rows with missing values from a data frame or excluding test records from analysis. Combined with value = TRUE, it returns the non-matching strings directly rather than their indices.

grep("a", fruits, invert = TRUE, value = TRUE)
# [1] "cherry"    "elderberry"

Common patterns

The examples above cover grep() in isolation. In practice, you combine it with other R functions to solve real data problems. These three patterns handle the most frequent grep-related tasks in data analysis.

Counting matches

Getting a count of matching elements is as simple as wrapping the result in length(). This pattern answers “how many rows contain this pattern?” and is commonly used in data quality checks — counting missing-value indicators, malformed entries, or records that match a validation rule.

length(grep("pattern", strings))

Filtering data frames

Feeding grep() output into the row index of a data frame subsets rows based on a text pattern: df[grep("pattern", df$column), ]. This is base R’s equivalent of dplyr::filter(df, grepl("pattern", column)). The index-based approach works on matrices and data frames alike and does not require any additional packages.

df[grep("pattern", df$column), ]

Multiple patterns (OR logic)

Use the regex alternation operator | to match any of several patterns in a single grep() call. The pattern "apple|orange" matches strings containing either word. This is cleaner than chaining multiple grep() calls with union() and runs as a single regex pass. For a long list of alternatives, build the pattern programmatically with paste(terms, collapse = "|").

grep("apple|orange", fruits, value = TRUE)

grep() vs grepl(), which to use

grep() returns indices (integers) by default, or matched values when value = TRUE. grepl() returns a logical vector. Which to use depends on what you do next: if you need to subset a vector with [, grepl() gives you the mask directly; if you need the positions for further arithmetic or to pass to another function, grep() is more convenient.

grep(pattern, x, value = TRUE) is equivalent to x[grepl(pattern, x)] but written as a single call. Both are idiomatic; pick the one that reads more clearly for your use case.

Like all the base R regex functions, grep() accepts ignore.case = TRUE, fixed = TRUE (literal matching), and perl = TRUE (PCRE engine). For fixed-string searches in large vectors, fixed = TRUE provides a meaningful speed improvement.

When building data pipelines with dplyr, filter(df, grepl(pattern, col)) is the more idiomatic form. grep() is best in base R contexts where you need indices or matched values directly, such as finding which column names contain a keyword or locating rows in a plain matrix. Use grep(pattern, names(df), value = TRUE) to find column names that match a pattern, a common exploratory step on wide data frames.

# Find all measurement columns in a wide data frame
wide_df <- data.frame(id = 1:3, measure_a = 4:6, measure_b = 7:9, category = letters[1:3])
grep("^measure", names(wide_df), value = TRUE)
# [1] "measure_a" "measure_b"

grep() returns indices by default — the positions in x where the pattern matched. Use grep(pattern, x, value = TRUE) to return the matching strings themselves rather than their indices. This is frequently more useful for character vectors, but the index form is more useful when you need to subset a data frame by matching row positions: df[grep(pattern, df$col), ].

For logical membership testing, grepl() is simpler than grep() because it returns a logical vector of the same length as x, which composes naturally with subsetting and which(). Use grep() when you need indices or matched values; use grepl() when you need a logical mask. Both accept ignore.case = TRUE, fixed = TRUE, and perl = TRUE with identical behavior.

grep() returns indices of matching elements by default. Pass value = TRUE to return the matching strings themselves. grepl() returns a logical vector of the same length as the input — better for use as a filter. Both support perl = TRUE for PCRE regex syntax, which enables lookaheads, lookbehinds, and other advanced patterns not available in TRE.

See also