rguides

startsWith()

startsWith(x, prefix)

The startsWith() function checks whether each element of a character vector begins with a specified substring. It returns a logical vector of the same length, where each element is TRUE if the string starts with the prefix and FALSE otherwise.

Syntax

startsWith(x, prefix)

Parameters

ParameterTypeDefaultDescription
xcharacterRequiredA character vector to check
prefixcharacterRequiredThe prefix to look for at the start of each string

Examples

Basic usage

# Check if strings start with a prefix
words <- c("apple", "apricot", "banana", "application")

startsWith(words, "app")
# [1]  TRUE  TRUE FALSE  TRUE

startsWith(words, "ban")
# [1] FALSE FALSE  TRUE FALSE

The function returns a logical vector where each position tells you whether the corresponding input element starts with the given prefix. Since startsWith() is vectorized over its first argument, you can test an entire column or vector in a single call without writing an explicit loop. The prefix is matched as a literal string — no regex patterns or wildcards are interpreted, which keeps the behavior predictable when the prefix contains characters like . or * that would have special meaning in a regular expression.

Case sensitivity

# These functions are case-sensitive
startsWith("Hello", "hello")
# [1] FALSE

startsWith("Hello", "Hello")
# [1] TRUE

Because startsWith() does not support an ignore.case parameter, case-insensitive matching requires you to normalize both sides before calling the function. The idiomatic pattern is startsWith(tolower(x), tolower(prefix)), which converts both the target strings and the prefix to the same case before comparing. This approach is explicit about the normalization step and avoids hiding the case conversion inside a function option, making the code easier to audit when dealing with user-supplied input where capitalization is unpredictable.

Empty prefix

# Empty prefix matches everything
startsWith(c("a", "b", "c"), "")
# [1] TRUE TRUE TRUE

An empty string as the prefix matches every input, because every string technically starts with the zero-length prefix before its first character. While this behavior follows from the mathematical definition of string prefixes, it is worth keeping in mind when the prefix value comes from user input or configuration — an accidentally empty prefix will match all rows in a filter operation, potentially including records you intended to exclude. Defensive code can guard against this by checking nzchar(prefix) before calling startsWith() when the prefix is not a hard-coded literal.

Common patterns

Filtering data

# Filter rows based on prefix
df <- data.frame(
  name = c("user_data.csv", "report.pdf", "user_profile.csv", "notes.txt"),
  size = c(100, 250, 150, 50)
)

# Get only CSV files starting with "user_"
df[startsWith(df$name, "user_"), ]
#              name size
# 1  user_data.csv  100
# 3 user_profile.csv 150

Combining startsWith() with base R subsetting gives you a readable filter that selects rows based on a column prefix. This pattern is particularly useful when you have naming conventions encoded in string columns — for example, distinguishing survey questions by their ID prefix, separating log entries by severity label, or selecting configuration keys from a flat key-value table. The same logic works with dplyr::filter(): filter(df, startsWith(name, "user_")) reads naturally in a pipe chain and keeps the prefix condition close to the data it operates on.

Validation

# Check if column names start with a pattern
cols <- c("id", "name", "score_a", "score_b", "date")
startsWith(cols, "score_")
# [1] FALSE FALSE  TRUE  TRUE FALSE

When you have a dataset with systematically named columns — for instance, survey responses prefixed with "Q1_", "Q2_", or measurement variables prefixed with "meas_"startsWith() lets you identify related columns without hard-coding their full names. You can use the resulting logical vector with which() to get column indices, or pipe it into dplyr::select() via starts_with() (which wraps startsWith() internally). This approach is more maintainable than listing column names individually because it adapts automatically when new columns following the same naming convention are added.

String parsing

# Extract values with specific prefixes
lines <- c("ERROR: failed", "WARN: retry", "INFO: ok", "ERROR: crash")
error_lines <- lines[startsWith(lines, "ERROR:")]
error_lines
# [1] "ERROR: failed" "ERROR: crash"

startsWith() vs grepl() vs substr()

startsWith() is faster than grepl() for prefix matching because it does not compile a regex, it does a direct string comparison. Use startsWith() whenever you are matching a fixed string at the beginning of a value, and use grepl() only when you need a pattern (wildcards, character classes, alternation).

startsWith() is case-sensitive by default. For case-insensitive prefix matching, convert both sides to the same case first: startsWith(tolower(x), tolower(prefix)). The function handles NA inputs by returning NA in the corresponding position rather than throwing an error.

The equivalent in stringr is str_starts(x, fixed(prefix)) for literal matching or str_starts(x, prefix) for regex matching.

A common use case is filtering a data frame to rows where a column starts with a known prefix, log levels, file path prefixes, identifier namespaces. The pattern df[startsWith(df$col, prefix), ] or its dplyr equivalent filter(df, startsWith(col, prefix)) is more readable than a regex for this purpose. When checking many prefixes at once, combine with vapply or use grepl with alternation instead.

startsWith() was added in R 3.3.0, so it is available in all modern R installations. It is vectorized over both x and prefix, if you provide a vector of prefixes, the function recycles to match lengths, comparing element-wise. This means startsWith(c("a1", "b2", "c3"), c("a", "b")) checks whether "a1" starts with "a" and "b2" starts with "b" (and recycles for the third element). Use outer() or a loop if you want to test all strings against all prefixes and get a matrix of results.

startsWith() is case-sensitive and does not support glob or regex patterns, it matches the literal prefix string only. For case-insensitive prefix matching, use grepl() with ignore.case = TRUE and a ^ anchor, or startsWith(tolower(x), tolower(prefix)). For matching any of several prefixes, use startsWith() inside sapply() or use grepl() with an alternation pattern: grepl("^(prefix1|prefix2)", x).

startsWith() and endsWith() are vectorized over both x and prefix — both arguments can be character vectors of the same length, in which case each element of x is tested against the corresponding element of prefix. When only x is a vector and prefix is a scalar, every element is tested against the same prefix. This vectorization makes both functions useful in dplyr::filter() and other column operations.

See also