rguides

grepl()

grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

grepl() searches for matches to a pattern in a character vector and returns a logical vector indicating whether each element matches.

Syntax

grepl(pattern, x, ignore.case = FALSE, perl = FALSE, 
      fixed = FALSE, useBytes = FALSE)

Parameters

ParameterTypeDefaultDescription
patterncharacter,A regular expression pattern to match
xcharacterA character vector to search in
ignore.caselogicalFALSEIf TRUE, ignore case when matching
perllogicalFALSEIf TRUE, use Perl-compatible regular expressions
fixedlogicalFALSEIf TRUE, treat pattern as a literal string (faster)
useByteslogicalFALSEIf TRUE, match byte-by-byte rather than character-by-character

Examples

Basic usage

fruits <- c("apple", "banana", "cherry", "date", "elderberry")

grepl("a", fruits)
# [1]  TRUE  TRUE FALSE  TRUE FALSE

Case-insensitive matching

The ignore.case = TRUE argument makes grepl() treat uppercase and lowercase letters as equivalent. A search for "red" matches "Red" and "GREEN" without any pre-processing of the input vector. This is the simplest way to handle inconsistent capitalization in user-submitted data, log files, or merged datasets where the same value appears in different cases.

colors <- c("Red", "GREEN", "blue", "YELLOW")

# Ignore case when matching
grepl("red", colors, ignore.case = TRUE)
# [1]  TRUE  TRUE FALSE  TRUE

Fixed string matching (faster)

When your pattern is a literal string with no regex metacharacters, pass fixed = TRUE to skip regex compilation. This is measurably faster on large vectors and also prevents accidental regex interpretation — searching for "@" with the default regex engine would work but is unnecessary overhead. Use fixed = TRUE whenever the pattern is a known substring rather than a pattern that needs wildcards or alternation.

emails <- c("user@example.com", "test@domain.org", "invalid")

grep("@", emails, fixed = TRUE)
# [1] 1 2

Common patterns

The examples above show grepl() in isolation. In practice, you combine the logical vector it returns with other R functions. These three patterns handle the most frequent grepl-related tasks you will encounter in data analysis workflows.

Counting matches

Since grepl() returns a logical vector where TRUE is 1 and FALSE is 0, wrapping the result in sum() gives you a count of matching elements. This is useful for data quality checks — counting rows with missing values, records matching a validation rule, or entries containing a specific keyword.

# Count how many elements match
sum(grepl("pattern", strings))

Filtering data frames

The logical vector from grepl() feeds directly into R’s [ subset operator to filter rows: df[grepl("pattern", df$column), ]. This is base R’s equivalent of dplyr::filter(df, grepl("pattern", column)). The logical-vector approach works on data frames, matrices, and atomic vectors alike, making it a versatile pattern that does not rely on any external packages.

# Filter rows containing a pattern
df[grepl("pattern", df$column), ]

Multiple patterns (OR logic)

The regex alternation operator | lets you match any of several patterns in a single grepl() call. The pattern "apple|orange" matches strings containing either word. This is cleaner than chaining multiple grepl() calls with | and runs as a single regex pass. For programmatic pattern construction, use paste(terms, collapse = "|") to build the alternation string from a vector of search terms.

# Match multiple patterns using alternation
fruits <- c("apple", "orange", "banana", "grape")
grepl("apple|orange", fruits)
# [1]  TRUE  TRUE FALSE FALSE

grepl() vs grep() and when to use each

grepl() returns a logical vector of the same length as x, making it natural for filtering: x[grepl(pattern, x)]. grep() returns indices or matched values directly and is useful when you need positions rather than a mask.

For fixed-string matching (no wildcards), pass fixed = TRUE — this skips regex compilation and is considerably faster when checking many strings against a literal pattern. For case-insensitive matching without converting case, use ignore.case = TRUE.

The perl = TRUE flag switches the regex engine to PCRE, which supports lookaheads, lookbehinds, and named capture groups that are not available in the default TRE engine. For most string-detection tasks, the default engine is sufficient.

grepl() propagates NA values: a missing string returns NA. In filtering contexts, combine with & !is.na(x) if you want to exclude missing values from matches.

# NA propagation in grepl()
vals <- c("apple", NA, "banana", "cherry")
grepl("a", vals)
# [1]  TRUE    NA  TRUE FALSE

# Exclude NAs when filtering
vals[grepl("a", vals) & !is.na(vals)]
# [1] "apple"  "banana"

The stringr equivalent is str_detect(x, pattern), which always uses PCRE and has consistent NA handling.

A practical note on regex performance: for simple literal substring checks (fixed = TRUE), grepl() is significantly faster than the default TRE regex engine. For checking whether a column contains a fixed string across millions of rows, grepl(pattern, x, fixed = TRUE) can be 5–10x faster than the regex variant. When pattern is a user-supplied string that might contain regex metacharacters, using fixed = TRUE also prevents accidental regex injection.

grepl() returns a logical vector the same length as the input, making it suitable for filtering: x[grepl(pattern, x)]. Unlike grep(), which returns indices or matching values, grepl() fits naturally in filter() and other tidyverse verbs. Set perl = TRUE to enable PCRE syntax for lookaheads and non-capturing groups. ignore.case = TRUE performs case-insensitive matching without altering the input.

See also

startsWith(x, prefix) and endsWith(x, suffix) are faster alternatives to grepl('^prefix', x) for simple prefix/suffix checks.

grepl(pattern, x, value = FALSE) in base R is equivalent to str_detect(x, pattern) but with arguments reversed. The perl = TRUE flag enables PCRE (Perl-Compatible Regular Expressions) which supports lookaheads, lookbehinds, and non-capturing groups. grepl("(?i)pattern", x, perl = TRUE) is case-insensitive PCRE matching — the (?i) modifier applies only to the subsequent pattern, not the entire expression. str_detect() handles the ignore_case option more readably via regex("pattern", ignore_case = TRUE).