rguides

How to Filter Rows by a Condition in R

To filter rows by condition in R, the cleanest approach is dplyr::filter(). It takes a data frame and a logical expression, returning only the rows where the expression is TRUE. Chain multiple conditions with & (AND) or | (OR), and the result fits naturally inside a |> or %>% pipeline.

library(dplyr)

df <- data.frame(
  name = c("Alice", "Bob", "Carol", "Dave"),
  salary = c(55000, 45000, 72000, 39000),
  department = c("Engineering", "Sales", "Engineering", "Marketing")
)

# Single condition
high_earners <- df |> filter(salary > 50000)

# Multiple conditions
eng_high <- df |> filter(salary > 50000 & department == "Engineering")

The %in% operator is useful for matching against a set of values: filter(df, department %in% c("Engineering", "Sales")).

NA values have subtle behavior with filter(). A condition that evaluates to NA drops the row — filter() keeps only TRUE results. This means missing values in the filter column are silently removed from the output, which is usually what you want. To keep rows where the filter column is NA, include it explicitly:

# Keep rows where salary > 50k OR salary is NA
df |> filter(salary > 50000 | is.na(salary))

Base R uses df[df$salary > 50000, ] with the same logical operators, but the syntax is more repetitive and NA rows appear as blank entries rather than being dropped. For consistent behavior, prefer dplyr::filter(). When performance matters on large datasets, data.table offers dt[salary > 50000] with comparable speed and minimal syntax.

See also