tidyr::drop_na()

Updated May 29, 2026· Tidyverse

rtidyrdata-wranglingdata-cleaningna-values

Overview

drop_na() removes any row from a data frame that contains at least one NA. It is the fastest way to clean a dataset before analysis when you know that incomplete rows are not informative.

The function is straightforward: rows with any missing value in any column are removed. You can optionally restrict the check to specific columns.

Signature

drop_na(data, ...)

Parameters

Parameter	Type	Default	Description
`data`	tibble / data frame	,	Input data.
`...`			Optional columns to restrict the check to. If not supplied, all columns are checked.

Basic usage

When you call drop_na() without arguments, every row that contains at least one NA in any column is removed from the output. This aggressive filtering is appropriate when complete cases are required for the analysis. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Drop rows with any missing value

library(tidyr)

df <- tibble(
  name  = c("Alice", "Bob", "Carol"),
  age   = c(25, NA, 31),
  score = c(90, 85, NA)
)

df %>% drop_na()
# # A tibble: 1 x 3
#   name   age score
#   <chr> <dbl> <dbl>
# 1 Alice    25    90

Only Alice has no missing values in any column, so she is the only row that remains.

By naming specific columns inside drop_na(), you limit the missing-value check to those columns and keep rows that are complete in the columns that matter for your current step. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Drop rows with missing values in specific columns

Use column names as arguments to restrict the check to those columns:

df %>% drop_na(age)
# # A tibble: 2 x 3
#   name   age score
#   <chr> <dbl> <dbl>
# 1 Alice    25    90
# 2 Carol    31    NA    # age is present, kept even though score is NA

Only rows where age is missing would be dropped. Carol stays because age is not NA, even though score is NA. This step ensures that subsequent statistical functions will run without errors from missing data. This clean-up step is essential for preparing reliable input for any statistical model.

Drop across multiple specific columns

df %>% drop_na(age, score)
# # A tibble: 1 x 3
#   name   age score
#   <chr> <dbl> <dbl>
# 1 Alice    25    90

Both age and score must be non-missing. Bob is dropped because age is NA. Carol is dropped because score is NA. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Common use cases

Cleaning before modelling

Most statistical functions in R fail or produce NA output when input contains missing values. drop_na() is a quick way to clean a dataset before fitting a model: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

library(dplyr)

survey <- tibble(
  id     = 1:5,
  q1     = c("agree", NA, "neutral", "agree", NA),
  q2     = c("disagree", "agree", "neutral", NA, "agree"),
  result = c(10, 20, 30, 40, 50)
)

survey %>%
  drop_na() %>%
  summarise(mean_result = mean(result))
# # A tibble: 1 x 1
#   mean_result
#          <dbl>
# 1           20

Rows 2, 4, and 5 had at least one NA in the response columns and were dropped before summarising.

Removing incomplete observations from time series

prices <- tibble(
  date   = as.Date(c("2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04")),
  open   = c(100, NA, 102, 103),
  close  = c(101, 101, NA, 104)
)

prices %>% drop_na()
# # A tibble: 2 x 3
#   date       open close
#   <date>    <dbl> <dbl>
# 1 2024-01-01   100   101
# 2 2024-01-04   103   104

When you call drop_na() without arguments, every row that contains at least one NA in any column is removed from the output. This aggressive filtering is appropriate when complete cases are required for the analysis. Use this approach when you need to prepare data for further analysis in a tidy workflow.

Using with dplyr pipelines

drop_na() integrates naturally in a %>% pipeline:

df %>%
  filter(status == "complete") %>%
  drop_na(starts_with("q")) %>%
  mutate(total = rowSums(across(where(is.numeric))))

Alternative approaches

Base R

df[complete.cases(df), ]

complete.cases() returns a logical vector, TRUE for rows with no NA. This is equivalent to drop_na() but less readable in a pipeline. This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

Using `tidyr::fill()` before dropping

If missing values should be filled rather than dropped, use fill() first:

df %>%
  fill(age, .direction = "down") %>%
  drop_na()

fill() replaces NA values with the previous non-missing value (or next, depending on .direction), then drop_na() removes any remaining rows that still have NA in other columns. This pattern is common in real-world data analysis pipelines. This clean-up step is essential for preparing reliable input for any statistical model.

Using `dplyr::filter()` with `is.na()`

For more control over which rows to keep:

df %>%
  filter(!is.na(age), !is.na(score))

This is equivalent to drop_na(age, score) but lets you apply different conditions to each column.

Gotchas

Dropping drops the whole row. drop_na() never drops individual cells, it drops the entire row if any cell in that row is NA. If you want to drop specific columns instead, use select() first: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

df %>% select(-score) %>% drop_na()

Data-dependent dropping. Dropping rows changes your dataset’s structure. If different rows are missing on different runs (for example, when reading new data), the number of rows after drop_na() will vary. Check your row count after dropping to catch unexpected missingness.

NA in non-numeric columns. drop_na() checks all columns by default, not just numeric ones. A character column with NA as an explicit string (not an R NA) will not be dropped: This additional context makes the transformation pattern clearer and easier to adapt to your own data analysis needs.

df <- tibble(
  name = c("Alice", "Bob", "Carol"),
  note = c("active", NA_character_, "inactive")  # NA is R's NA, not string
)

df %>% drop_na()
# # A tibble: 2 x 2
#   name   note
#   <chr>  <chr>
# 1 Alice  active
# 2 Carol  inactive

Bob is dropped because his note is R’s NA, not the string "NA".