tidyr::drop_na()

Updated April 20, 2026 · Tidyverse
r tidyr data-wrangling data-cleaning na-values

Overview

drop_na() removes any row from a data frame that contains at least one NA. It is the fastest way to clean a dataset before analysis when you know that incomplete rows are not informative.

The function is straightforward: rows with any missing value in any column are removed. You can optionally restrict the check to specific columns.

Signature

drop_na(data, ...)

Parameters

ParameterTypeDefaultDescription
datatibble / data frameInput data.
...Optional columns to restrict the check to. If not supplied, all columns are checked.

Basic Usage

Drop rows with any missing value

library(tidyr)

df <- tibble(
  name  = c("Alice", "Bob", "Carol"),
  age   = c(25, NA, 31),
  score = c(90, 85, NA)
)

df %>% drop_na()
# # A tibble: 1 x 3
#   name   age score
#   <chr> <dbl> <dbl>
# 1 Alice    25    90

Only Alice has no missing values in any column, so she is the only row that remains.

Drop rows with missing values in specific columns

Use column names as arguments to restrict the check to those columns:

df %>% drop_na(age)
# # A tibble: 2 x 3
#   name   age score
#   <chr> <dbl> <dbl>
# 1 Alice    25    90
# 2 Carol    31    NA    # age is present, kept even though score is NA

Only rows where age is missing would be dropped. Carol stays because age is not NA, even though score is NA.

Drop across multiple specific columns

df %>% drop_na(age, score)
# # A tibble: 1 x 3
#   name   age score
#   <chr> <dbl> <dbl>
# 1 Alice    25    90

Both age and score must be non-missing. Bob is dropped because age is NA. Carol is dropped because score is NA.

Common Use Cases

Cleaning before modelling

Most statistical functions in R fail or produce NA output when input contains missing values. drop_na() is a quick way to clean a dataset before fitting a model:

library(dplyr)

survey <- tibble(
  id     = 1:5,
  q1     = c("agree", NA, "neutral", "agree", NA),
  q2     = c("disagree", "agree", "neutral", NA, "agree"),
  result = c(10, 20, 30, 40, 50)
)

survey %>%
  drop_na() %>%
  summarise(mean_result = mean(result))
# # A tibble: 1 x 1
#   mean_result
#          <dbl>
# 1           20

Rows 2, 4, and 5 had at least one NA in the response columns and were dropped before summarising.

Removing incomplete observations from time series

prices <- tibble(
  date   = as.Date(c("2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04")),
  open   = c(100, NA, 102, 103),
  close  = c(101, 101, NA, 104)
)

prices %>% drop_na()
# # A tibble: 2 x 3
#   date       open close
#   <date>    <dbl> <dbl>
# 1 2024-01-01   100   101
# 2 2024-01-04   103   104

Using with dplyr pipelines

drop_na() integrates naturally in a %>% pipeline:

df %>%
  filter(status == "complete") %>%
  drop_na(starts_with("q")) %>%
  mutate(total = rowSums(across(where(is.numeric))))

Alternative Approaches

Base R

df[complete.cases(df), ]

complete.cases() returns a logical vector — TRUE for rows with no NA. This is equivalent to drop_na() but less readable in a pipeline.

Using tidyr::fill() before dropping

If missing values should be filled rather than dropped, use fill() first:

df %>%
  fill(age, .direction = "down") %>%
  drop_na()

fill() replaces NA values with the previous non-missing value (or next, depending on .direction), then drop_na() removes any remaining rows that still have NA in other columns.

Using dplyr::filter() with is.na()

For more control over which rows to keep:

df %>%
  filter(!is.na(age), !is.na(score))

This is equivalent to drop_na(age, score) but lets you apply different conditions to each column.

Gotchas

Dropping drops the whole row. drop_na() never drops individual cells — it drops the entire row if any cell in that row is NA. If you want to drop specific columns instead, use select() first:

df %>% select(-score) %>% drop_na()

Data-dependent dropping. Dropping rows changes your dataset’s structure. If different rows are missing on different runs (for example, when reading new data), the number of rows after drop_na() will vary. Check your row count after dropping to catch unexpected missingness.

NA in non-numeric columns. drop_na() checks all columns by default, not just numeric ones. A character column with NA as an explicit string (not an R NA) will not be dropped:

df <- tibble(
  name = c("Alice", "Bob", "Carol"),
  note = c("active", NA_character_, "inactive")  # NA is R's NA, not string
)

df %>% drop_na()
# # A tibble: 2 x 2
#   name   note
#   <chr>  <chr>
# 1 Alice  active
# 2 Carol  inactive

Bob is dropped because his note is R’s NA, not the string "NA".

See Also