tidyr::complete()
Overview
complete() finds combinations of values that are implied by the data but not present as rows. It is the opposite of drop_na() — where drop_na() removes rows with missing values, complete() adds them.
This comes up constantly in observational data. If you measure temperature every hour for three days but only recorded values on weekdays, your data has implicit gaps on weekends. complete() makes those gaps explicit by adding rows with NA in the measured columns.
Signature
complete(data, ..., fill = list())
Parameters
| Parameter | Type | Description |
|---|---|---|
data | tibble / data frame | Input data. |
... | Columns whose unique values define the grid of combinations to expand. | |
fill | list | Optional named list of values to fill for each column. |
Basic Usage
Single column
library(tidyr)
df <- tibble(
day = c("Monday", "Tuesday", "Wednesday"),
sales = c(100, 150, 90)
)
complete(df, day)
# # A tibble: 7 x 2
# day sales
# <chr> <dbl>
# 1 Friday NA
# 2 Monday 100
# 3 Saturday NA
# 4 Sunday NA
# 5 Thursday NA
# 6 Tuesday 150
# 7 Wednesday 90
complete(df, day) uses all unique values of day (the seven days of the week), generating rows for any day not in the original data.
Multiple columns — all combinations
df <- tibble(
year = c(2022, 2022, 2023),
qtr = c(1, 3, 1),
sales = c(100, 90, 110)
)
complete(df, year, qtr)
# # A tibble: 8 x 3
# year qtr sales
# <dbl> <dbl> <dbl>
# 1 2022 1 100
# 2 2022 2 NA # implicit gap
# 3 2022 3 90
# 4 2022 4 NA # implicit gap
# 5 2023 1 110
# 6 2023 2 NA
# 7 2023 3 NA # 2023 only had qtr 1 in original data
# 8 2023 4 NA
Every combination of year and qtr gets a row. NA in sales signals that no observation existed for that combination.
Using nesting() for specific combinations
nesting() limits the expansion to only the combinations that appear in the data, rather than all theoretical combinations:
df <- tibble(
x = c(1, 1, 2),
y = c("a", "b", "a"),
value = c(10, 20, 30)
)
complete(df, nesting(x, y))
# # A tibble: 3 x 3
# x y value
# <dbl> <chr> <dbl>
# 1 1 a 10
# 2 1 b 20
# 3 2 a 30
This is useful when you only want to fill in missing combinations within an existing cross product, not expand to all possible combinations.
Filling Specific Values
Providing default fill values
Use fill to supply values for missing entries:
df <- tibble(
product = c("A", "A", "B"),
month = c("Jan", "Feb", "Jan"),
sales = c(100, 150, 90)
)
complete(df, product, month, fill = list(sales = 0))
This fills missing sales with 0 instead of NA.
Filling with a computed value
df <- tibble(
site = c("site1", "site1", "site2"),
day = c("Mon", "Tue", "Mon"),
visits = c(200, 250, 180)
)
complete(df, site, day) %>%
group_by(site) %>%
mutate(visits = coalesce(visits, median(visits, na.rm = TRUE)))
Use coalesce() after complete() to fill NA with a computed value rather than a hardcoded default.
Common Use Cases
Time series with regular intervals
When your data should cover every period but doesn’t:
library(dplyr)
df <- tibble(
date = as.Date(c("2024-01-01", "2024-01-03", "2024-01-05")),
value = c(10, 12, 11)
)
df %>%
complete(date = seq.Date(min(date), max(date), by = "day")) %>%
arrange(date)
# # A tibble: 5 x 2
# date value
# <date> <dbl>
# 1 2024-01-01 10
# 2 2024-01-02 NA # gap
# 3 2024-01-03 12
# 4 2024-01-04 NA # gap
# 5 2024-01-05 11
Factorial experimental designs
df <- tibble(
treatment = c("control", "control", "low"),
dose = c("low", "high", "low"),
outcome = c(5, 7, 6)
)
complete(df, treatment, dose)
# # A tibble: 6 x 3
# treatment dose outcome
# <chr> <chr> <dbl>
# 1 control high 7
# 2 control low 5
# 3 high high NA # treatment x dose combination not in data
# 4 high low NA
# 5 low high NA
# 6 low low 6
This reveals which treatment-dose combinations were never tested.
Survey response grids
df <- tibble(
respondent = c("A", "A", "B"),
question = c("q1", "q2", "q1"),
score = c(4, 3, 5)
)
complete(df, respondent, question)
# # A tibble: 6 x 3
# respondent question score
# <chr> <chr> <dbl>
# 1 A q1 4
# 2 A q2 3
# 3 B q1 5
# 4 B q2 NA # B never answered q2
Alternative Approaches
Base R
# Create all combinations manually
expand.grid(year = 2022:2023, qtr = 1:4)
This is equivalent to complete(df, year, qtr) but requires more manual work and returns a data frame without the original data.
Using dplyr::full_join()
df <- tibble(year = c(2022, 2023), qtr = c(1, 1), sales = c(100, 110))
grid <- tibble(year = c(2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023),
qtr = c(1, 2, 3, 4, 1, 2, 3, 4))
df %>%
full_join(grid, by = c("year", "qtr"))
This is more verbose but useful when you have an external grid defined independently.
Gotchas
Explosion with many columns. complete(a, b, c, d, e) with 5 columns each having 10 unique values produces 100,000 rows. Verify the cartesian product size before running complete() on high-cardinality columns.
Implicit vs explicit missing. complete() reveals implicit gaps, but it does not distinguish between a value that was never measured and one that was measured and found to be missing. Use fill() before complete() if you need to track which was which.
Dplyr grouping. complete() respects group_by() from dplyr, which can produce unexpected results:
df %>%
group_by(site) %>%
complete(date = full_dates, fill = list(value = 0))
Here the fill applies within each group, which is usually what you want. But if you forget the grouping, the fill might be wrong.
See Also
- /reference/tidyverse/tidyr_pivot_longer/ — reshape wide data to long by gathering columns
- /reference/tidyverse/tidyr_pivot_wider/ — reshape long data to wide by spreading columns
- /reference/tidyverse/tidyr_fill/ — fill missing values in a column with nearby values