dplyr::filter()

filter(.data, ...)

Returns: tibble · Updated March 31, 2026 · Tidyverse

r dplyr data-wrangling tidyverse

The filter() function from dplyr selects rows from a data frame or tibble based on one or more logical conditions. It is one of the core dplyr verbs and provides a readable, expressive alternative to base R’s [ subsetting. Conditions in filter() are evaluated row-by-row, and only rows where all conditions evaluate to TRUE are retained.

Syntax

filter(.data, ...)

Parameters

Parameter	Type	Default	Description
`.data`	tibble / data.frame	Required	A tibble or data frame to filter
`...`	logical expressions	Required	Conditions that must evaluate to `TRUE` for a row to be kept

Basic Usage

Single condition

library(dplyr)

df <- tibble(
  name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
  age = c(25, 30, 35, 28, 22),
  department = c("Sales", "Engineering", "Sales", "Marketing", "Engineering")
)

# Keep only rows where age is greater than 25
filter(df, age > 25)
# # A tibble: 3 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Bob        30 Engineering
# 2 Charlie    35 Sales
# 3 Diana      28 Marketing

Multiple conditions (AND)

Separate conditions with a comma or & — both are equivalent and imply AND logic. All conditions must be TRUE.

# Filter where department is Sales AND age is above 30
filter(df, department == "Sales", age > 30)
# # A tibble: 1 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Charlie    35 Sales

# Same result using & explicitly
filter(df, department == "Sales" & age > 30)
# # A tibble: 1 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Charlie    35 Sales

OR conditions

Use | to keep rows where either condition is true.

# Keep rows where department is Sales OR Marketing
filter(df, department == "Sales" | department == "Marketing")
# # A tibble: 2 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Charlie    35 Sales
# 2 Diana      28 Marketing

Mixing AND and OR

Use parentheses to control operator precedence.

# Keep rows where age is above 30 AND (department is Sales OR name is Bob)
filter(df, age > 30 & (department == "Sales" | name == "Bob"))
# # A tibble: 1 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Charlie    35 Sales

Helper functions

between()

between() checks if a numeric value falls within a range (inclusive). It is a shorthand for x >= left & x <= right.

# Keep rows where age is between 25 and 35 (inclusive)
filter(df, between(age, 25, 35))
# # A tibble: 4 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Alice      25 Sales
# 2 Bob        30 Engineering
# 3 Charlie    35 Sales
# 4 Diana      28 Marketing

near()

near() compares floating-point numbers with a configurable tolerance. Direct equality comparison with == on floating-point values is unreliable due to rounding errors.

df_float <- tibble(
  x = c(0.1, 0.2, 0.3, 0.1 + 0.2),
  y = c(1, 2, 3, 4)
)

# Using == on 0.1 + 0.2 and 0.3 gives FALSE — classic floating point trap
filter(df_float, x == 0.3)
# # A tibble: 0 × 2

# Using near() with default tolerance (sqrt(.Machine$double.eps))
filter(df_float, near(x, 0.3))
# # A tibble: 2 × 2
#       x     y
#   <dbl> <dbl>
# 1   0.3     3
# 2   0.3     4

Handling missing values

filter() removes rows where the condition evaluates to NA. Any NA in a condition causes that row to be dropped, unless you explicitly handle it.

df_na <- tibble(
  name = c("Alice", "Bob", "Charlie", NA),
  age = c(25, NA, 35, 28)
)

# Default: rows with NA in any condition are dropped
filter(df_na, age > 25)
# # A tibble: 1 × 2
#   name      age
#   <chr>   <dbl>
# 1 Charlie    35

# Keep rows where age > 25 OR age is NA
filter(df_na, age > 25 | is.na(age))
# # A tibble: 2 × 2
#   name      age
#   <chr>   <dbl>
# 1 Charlie    35
# 2 Bob         NA

# Keep rows where age is not NA
filter(df_na, !is.na(age))
# # A tibble: 3 × 2
#   name      age
#   <chr>   <dbl>
# 1 Alice      25
# 2 Charlie    35
# 3 NA         28

Using filter() with group_by()

When filter() is used after group_by(), it operates within each group, not on the whole dataset. This lets you keep, for example, the top performer in each department.

df_grouped <- tibble(
  department = c(rep("Sales", 3), rep("Engineering", 3)),
  name = c("Alice", "Bob", "Charlie", "Diana", "Eve", "Frank"),
  score = c(85, 92, 78, 88, 95, 83)
)

# Find the highest scorer in each department
df_grouped %>%
  group_by(department) %>%
  filter(score == max(score)) %>%
  ungroup()
# # A tibble: 2 × 3
#   department name   score
#   <chr>      <chr> <dbl>
# 1 Sales      Bob      92
# 2 Engineering Eve     95

Slice variants

The dplyr slice family offers position-based row selection that complements filter():

Function	What it does
`slice_head(n)`	Keep the first `n` rows
`slice_tail(n)`	Keep the last `n` rows
`slice_sample(n)`	Keep `n` randomly sampled rows
`slice_min(n)`	Keep the `n` rows with smallest values
`slice_max(n)`	Keep the `n` rows with largest values

# First 3 rows
slice_head(df, n = 3)
# # A tibble: 3 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Alice      25 Sales
# 2 Bob        30 Engineering
# 3 Charlie    35 Sales

# Last 2 rows
slice_tail(df, n = 2)
# # A tibble: 2 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Diana      28 Marketing
# 2 Eve        22 Engineering

# Random sample of 2 rows
slice_sample(df, n = 2)
# # A tibble: 2 × 3 (random)

# Top 2 by age
slice_max(df, age, n = 2)
# # A tibble: 2 × 3
#   name      age department
#   <chr>   <dbl> <chr>
# 1 Charlie    35 Sales
# 2 Bob        30 Engineering

Common Patterns

Chaining with pipe: df %>% filter(condition) %>% select(col1, col2)
Using %in%: filter(df, name %in% c("Alice", "Bob"))
Negating conditions: filter(df, !is.na(column))
Filtering across multiple columns: Use if_all() or if_any() with column ranges
Combining with mutate: df %>% mutate(ratio = x / y) %>% filter(ratio > 0.5)