tidyr::complete()

Updated April 20, 2026 · Tidyverse
r tidyr data-wrangling missing-data tidy-data

Overview

complete() finds combinations of values that are implied by the data but not present as rows. It is the opposite of drop_na() — where drop_na() removes rows with missing values, complete() adds them.

This comes up constantly in observational data. If you measure temperature every hour for three days but only recorded values on weekdays, your data has implicit gaps on weekends. complete() makes those gaps explicit by adding rows with NA in the measured columns.

Signature

complete(data, ..., fill = list())

Parameters

ParameterTypeDescription
datatibble / data frameInput data.
...Columns whose unique values define the grid of combinations to expand.
filllistOptional named list of values to fill for each column.

Basic Usage

Single column

library(tidyr)

df <- tibble(
  day   = c("Monday", "Tuesday", "Wednesday"),
  sales = c(100, 150, 90)
)

complete(df, day)
# # A tibble: 7 x 2
#   day       sales
#   <chr>     <dbl>
# 1 Friday      NA
# 2 Monday     100
# 3 Saturday   NA
# 4 Sunday     NA
# 5 Thursday   NA
# 6 Tuesday    150
# 7 Wednesday   90

complete(df, day) uses all unique values of day (the seven days of the week), generating rows for any day not in the original data.

Multiple columns — all combinations

df <- tibble(
  year  = c(2022, 2022, 2023),
  qtr   = c(1, 3, 1),
  sales = c(100, 90, 110)
)

complete(df, year, qtr)
# # A tibble: 8 x 3
#   year   qtr  sales
#   <dbl> <dbl> <dbl>
# 1  2022    1   100
# 2  2022    2    NA    # implicit gap
# 3  2022    3    90
# 4  2022    4    NA    # implicit gap
# 5  2023    1   110
# 6  2023    2    NA
# 7  2023    3    NA    # 2023 only had qtr 1 in original data
# 8  2023    4    NA

Every combination of year and qtr gets a row. NA in sales signals that no observation existed for that combination.

Using nesting() for specific combinations

nesting() limits the expansion to only the combinations that appear in the data, rather than all theoretical combinations:

df <- tibble(
  x = c(1, 1, 2),
  y = c("a", "b", "a"),
  value = c(10, 20, 30)
)

complete(df, nesting(x, y))
# # A tibble: 3 x 3
#   x     y     value
#   <dbl> <chr> <dbl>
# 1     1 a        10
# 2     1 b        20
# 3     2 a        30

This is useful when you only want to fill in missing combinations within an existing cross product, not expand to all possible combinations.

Filling Specific Values

Providing default fill values

Use fill to supply values for missing entries:

df <- tibble(
  product = c("A", "A", "B"),
  month   = c("Jan", "Feb", "Jan"),
  sales   = c(100, 150, 90)
)

complete(df, product, month, fill = list(sales = 0))

This fills missing sales with 0 instead of NA.

Filling with a computed value

df <- tibble(
  site = c("site1", "site1", "site2"),
  day  = c("Mon", "Tue", "Mon"),
  visits = c(200, 250, 180)
)

complete(df, site, day) %>%
  group_by(site) %>%
  mutate(visits = coalesce(visits, median(visits, na.rm = TRUE)))

Use coalesce() after complete() to fill NA with a computed value rather than a hardcoded default.

Common Use Cases

Time series with regular intervals

When your data should cover every period but doesn’t:

library(dplyr)

df <- tibble(
  date = as.Date(c("2024-01-01", "2024-01-03", "2024-01-05")),
  value = c(10, 12, 11)
)

df %>%
  complete(date = seq.Date(min(date), max(date), by = "day")) %>%
  arrange(date)
# # A tibble: 5 x 2
#   date       value
#   <date>     <dbl>
# 1 2024-01-01    10
# 2 2024-01-02    NA    # gap
# 3 2024-01-03    12
# 4 2024-01-04    NA    # gap
# 5 2024-01-05    11

Factorial experimental designs

df <- tibble(
  treatment = c("control", "control", "low"),
  dose      = c("low", "high", "low"),
  outcome   = c(5, 7, 6)
)

complete(df, treatment, dose)
# # A tibble: 6 x 3
#   treatment dose    outcome
#   <chr>     <chr>    <dbl>
# 1 control   high       7
# 2 control   low        5
# 3 high      high      NA    # treatment x dose combination not in data
# 4 high      low       NA
# 5 low       high     NA
# 6 low       low        6

This reveals which treatment-dose combinations were never tested.

Survey response grids

df <- tibble(
  respondent = c("A", "A", "B"),
  question   = c("q1", "q2", "q1"),
  score      = c(4, 3, 5)
)

complete(df, respondent, question)
# # A tibble: 6 x 3
#   respondent question score
#   <chr>      <chr>    <dbl>
# 1 A          q1          4
# 2 A          q2          3
# 3 B          q1          5
# 4 B          q2         NA    # B never answered q2

Alternative Approaches

Base R

# Create all combinations manually
expand.grid(year = 2022:2023, qtr = 1:4)

This is equivalent to complete(df, year, qtr) but requires more manual work and returns a data frame without the original data.

Using dplyr::full_join()

df <- tibble(year = c(2022, 2023), qtr = c(1, 1), sales = c(100, 110))
grid  <- tibble(year = c(2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023),
               qtr  = c(1, 2, 3, 4, 1, 2, 3, 4))

df %>%
  full_join(grid, by = c("year", "qtr"))

This is more verbose but useful when you have an external grid defined independently.

Gotchas

Explosion with many columns. complete(a, b, c, d, e) with 5 columns each having 10 unique values produces 100,000 rows. Verify the cartesian product size before running complete() on high-cardinality columns.

Implicit vs explicit missing. complete() reveals implicit gaps, but it does not distinguish between a value that was never measured and one that was measured and found to be missing. Use fill() before complete() if you need to track which was which.

Dplyr grouping. complete() respects group_by() from dplyr, which can produce unexpected results:

df %>%
  group_by(site) %>%
  complete(date = full_dates, fill = list(value = 0))

Here the fill applies within each group, which is usually what you want. But if you forget the grouping, the fill might be wrong.

See Also