dplyr::filter()
filter(.data, ...) tibble · Updated March 31, 2026 · Tidyverse The filter() function from dplyr selects rows from a data frame or tibble based on one or more logical conditions. It is one of the core dplyr verbs and provides a readable, expressive alternative to base R’s [ subsetting. Conditions in filter() are evaluated row-by-row, and only rows where all conditions evaluate to TRUE are retained.
Syntax
filter(.data, ...)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
.data | tibble / data.frame | Required | A tibble or data frame to filter |
... | logical expressions | Required | Conditions that must evaluate to TRUE for a row to be kept |
Basic Usage
Single condition
library(dplyr)
df <- tibble(
name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
age = c(25, 30, 35, 28, 22),
department = c("Sales", "Engineering", "Sales", "Marketing", "Engineering")
)
# Keep only rows where age is greater than 25
filter(df, age > 25)
# # A tibble: 3 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Bob 30 Engineering
# 2 Charlie 35 Sales
# 3 Diana 28 Marketing
Multiple conditions (AND)
Separate conditions with a comma or & — both are equivalent and imply AND logic. All conditions must be TRUE.
# Filter where department is Sales AND age is above 30
filter(df, department == "Sales", age > 30)
# # A tibble: 1 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Charlie 35 Sales
# Same result using & explicitly
filter(df, department == "Sales" & age > 30)
# # A tibble: 1 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Charlie 35 Sales
OR conditions
Use | to keep rows where either condition is true.
# Keep rows where department is Sales OR Marketing
filter(df, department == "Sales" | department == "Marketing")
# # A tibble: 2 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Charlie 35 Sales
# 2 Diana 28 Marketing
Mixing AND and OR
Use parentheses to control operator precedence.
# Keep rows where age is above 30 AND (department is Sales OR name is Bob)
filter(df, age > 30 & (department == "Sales" | name == "Bob"))
# # A tibble: 1 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Charlie 35 Sales
Helper functions
between()
between() checks if a numeric value falls within a range (inclusive). It is a shorthand for x >= left & x <= right.
# Keep rows where age is between 25 and 35 (inclusive)
filter(df, between(age, 25, 35))
# # A tibble: 4 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Alice 25 Sales
# 2 Bob 30 Engineering
# 3 Charlie 35 Sales
# 4 Diana 28 Marketing
near()
near() compares floating-point numbers with a configurable tolerance. Direct equality comparison with == on floating-point values is unreliable due to rounding errors.
df_float <- tibble(
x = c(0.1, 0.2, 0.3, 0.1 + 0.2),
y = c(1, 2, 3, 4)
)
# Using == on 0.1 + 0.2 and 0.3 gives FALSE — classic floating point trap
filter(df_float, x == 0.3)
# # A tibble: 0 × 2
# Using near() with default tolerance (sqrt(.Machine$double.eps))
filter(df_float, near(x, 0.3))
# # A tibble: 2 × 2
# x y
# <dbl> <dbl>
# 1 0.3 3
# 2 0.3 4
Handling missing values
filter() removes rows where the condition evaluates to NA. Any NA in a condition causes that row to be dropped, unless you explicitly handle it.
df_na <- tibble(
name = c("Alice", "Bob", "Charlie", NA),
age = c(25, NA, 35, 28)
)
# Default: rows with NA in any condition are dropped
filter(df_na, age > 25)
# # A tibble: 1 × 2
# name age
# <chr> <dbl>
# 1 Charlie 35
# Keep rows where age > 25 OR age is NA
filter(df_na, age > 25 | is.na(age))
# # A tibble: 2 × 2
# name age
# <chr> <dbl>
# 1 Charlie 35
# 2 Bob NA
# Keep rows where age is not NA
filter(df_na, !is.na(age))
# # A tibble: 3 × 2
# name age
# <chr> <dbl>
# 1 Alice 25
# 2 Charlie 35
# 3 NA 28
Using filter() with group_by()
When filter() is used after group_by(), it operates within each group, not on the whole dataset. This lets you keep, for example, the top performer in each department.
df_grouped <- tibble(
department = c(rep("Sales", 3), rep("Engineering", 3)),
name = c("Alice", "Bob", "Charlie", "Diana", "Eve", "Frank"),
score = c(85, 92, 78, 88, 95, 83)
)
# Find the highest scorer in each department
df_grouped %>%
group_by(department) %>%
filter(score == max(score)) %>%
ungroup()
# # A tibble: 2 × 3
# department name score
# <chr> <chr> <dbl>
# 1 Sales Bob 92
# 2 Engineering Eve 95
Slice variants
The dplyr slice family offers position-based row selection that complements filter():
| Function | What it does |
|---|---|
slice_head(n) | Keep the first n rows |
slice_tail(n) | Keep the last n rows |
slice_sample(n) | Keep n randomly sampled rows |
slice_min(n) | Keep the n rows with smallest values |
slice_max(n) | Keep the n rows with largest values |
# First 3 rows
slice_head(df, n = 3)
# # A tibble: 3 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Alice 25 Sales
# 2 Bob 30 Engineering
# 3 Charlie 35 Sales
# Last 2 rows
slice_tail(df, n = 2)
# # A tibble: 2 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Diana 28 Marketing
# 2 Eve 22 Engineering
# Random sample of 2 rows
slice_sample(df, n = 2)
# # A tibble: 2 × 3 (random)
# Top 2 by age
slice_max(df, age, n = 2)
# # A tibble: 2 × 3
# name age department
# <chr> <dbl> <chr>
# 1 Charlie 35 Sales
# 2 Bob 30 Engineering
Common Patterns
- Chaining with pipe:
df %>% filter(condition) %>% select(col1, col2) - Using
%in%:filter(df, name %in% c("Alice", "Bob")) - Negating conditions:
filter(df, !is.na(column)) - Filtering across multiple columns: Use
if_all()orif_any()with column ranges - Combining with mutate:
df %>% mutate(ratio = x / y) %>% filter(ratio > 0.5)