tidyr::drop_na()
Overview
drop_na() removes any row from a data frame that contains at least one NA. It is the fastest way to clean a dataset before analysis when you know that incomplete rows are not informative.
The function is straightforward: rows with any missing value in any column are removed. You can optionally restrict the check to specific columns.
Signature
drop_na(data, ...)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data | tibble / data frame | — | Input data. |
... | Optional columns to restrict the check to. If not supplied, all columns are checked. |
Basic Usage
Drop rows with any missing value
library(tidyr)
df <- tibble(
name = c("Alice", "Bob", "Carol"),
age = c(25, NA, 31),
score = c(90, 85, NA)
)
df %>% drop_na()
# # A tibble: 1 x 3
# name age score
# <chr> <dbl> <dbl>
# 1 Alice 25 90
Only Alice has no missing values in any column, so she is the only row that remains.
Drop rows with missing values in specific columns
Use column names as arguments to restrict the check to those columns:
df %>% drop_na(age)
# # A tibble: 2 x 3
# name age score
# <chr> <dbl> <dbl>
# 1 Alice 25 90
# 2 Carol 31 NA # age is present, kept even though score is NA
Only rows where age is missing would be dropped. Carol stays because age is not NA, even though score is NA.
Drop across multiple specific columns
df %>% drop_na(age, score)
# # A tibble: 1 x 3
# name age score
# <chr> <dbl> <dbl>
# 1 Alice 25 90
Both age and score must be non-missing. Bob is dropped because age is NA. Carol is dropped because score is NA.
Common Use Cases
Cleaning before modelling
Most statistical functions in R fail or produce NA output when input contains missing values. drop_na() is a quick way to clean a dataset before fitting a model:
library(dplyr)
survey <- tibble(
id = 1:5,
q1 = c("agree", NA, "neutral", "agree", NA),
q2 = c("disagree", "agree", "neutral", NA, "agree"),
result = c(10, 20, 30, 40, 50)
)
survey %>%
drop_na() %>%
summarise(mean_result = mean(result))
# # A tibble: 1 x 1
# mean_result
# <dbl>
# 1 20
Rows 2, 4, and 5 had at least one NA in the response columns and were dropped before summarising.
Removing incomplete observations from time series
prices <- tibble(
date = as.Date(c("2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04")),
open = c(100, NA, 102, 103),
close = c(101, 101, NA, 104)
)
prices %>% drop_na()
# # A tibble: 2 x 3
# date open close
# <date> <dbl> <dbl>
# 1 2024-01-01 100 101
# 2 2024-01-04 103 104
Using with dplyr pipelines
drop_na() integrates naturally in a %>% pipeline:
df %>%
filter(status == "complete") %>%
drop_na(starts_with("q")) %>%
mutate(total = rowSums(across(where(is.numeric))))
Alternative Approaches
Base R
df[complete.cases(df), ]
complete.cases() returns a logical vector — TRUE for rows with no NA. This is equivalent to drop_na() but less readable in a pipeline.
Using tidyr::fill() before dropping
If missing values should be filled rather than dropped, use fill() first:
df %>%
fill(age, .direction = "down") %>%
drop_na()
fill() replaces NA values with the previous non-missing value (or next, depending on .direction), then drop_na() removes any remaining rows that still have NA in other columns.
Using dplyr::filter() with is.na()
For more control over which rows to keep:
df %>%
filter(!is.na(age), !is.na(score))
This is equivalent to drop_na(age, score) but lets you apply different conditions to each column.
Gotchas
Dropping drops the whole row. drop_na() never drops individual cells — it drops the entire row if any cell in that row is NA. If you want to drop specific columns instead, use select() first:
df %>% select(-score) %>% drop_na()
Data-dependent dropping. Dropping rows changes your dataset’s structure. If different rows are missing on different runs (for example, when reading new data), the number of rows after drop_na() will vary. Check your row count after dropping to catch unexpected missingness.
NA in non-numeric columns. drop_na() checks all columns by default, not just numeric ones. A character column with NA as an explicit string (not an R NA) will not be dropped:
df <- tibble(
name = c("Alice", "Bob", "Carol"),
note = c("active", NA_character_, "inactive") # NA is R's NA, not string
)
df %>% drop_na()
# # A tibble: 2 x 2
# name note
# <chr> <chr>
# 1 Alice active
# 2 Carol inactive
Bob is dropped because his note is R’s NA, not the string "NA".
See Also
- /cookbooks/how-to-remove-na-values/ — practical recipes for handling missing data
- /cookbooks/how-to-check-for-na-values/ — detect and count NA values before deciding whether to drop
- /reference/tidyverse/dplyr-filter/ — filter rows by condition, including missing value checks