tidyr::fill
fill(data, ..., .by = NULL, .direction = c("down", "up", "downup", "updown")) fill() replaces NA values in selected columns with the nearest non-NA value in a chosen direction. It’s the standard fix for data in “stacked” format where a value appears once and is left blank until it changes.
Signature
fill(data, ..., .by = NULL, .direction = c("down", "up", "downup", "updown"))
Parameters
| Parameter | Type | Description |
|---|---|---|
data | data.frame | Input data frame or tibble |
... | tidy-select | Columns to fill. Accepts tidyr::ends_with(), starts_with(), everything(), etc. |
.by | tidy-expr | Group by columns, fill operates within each group, not crossing boundaries |
.direction | string | Direction: "down" (default), "up", "downup", or "updown" |
Direction
The .direction argument controls which non-NA value fills the gap, offering four strategies that handle different patterns of missing data in your dataset. The choice of direction depends on whether known values appear above, below, or on both sides of the gaps.
# Default: propagate the previous value downward (LOCF — last observation carried forward)
df <- tibble(
quarter = c("Q1", "Q2", "Q3", "Q4", "Q1", "Q2"),
year = c("2000", NA, NA, NA, "2001", NA),
sales = c(66013, 69182, 53175, 21001, 46036, 58842)
)
df |> fill(year)
# # A tibble: 6 × 3
# quarter year sales
# <chr> <chr> <dbl>
# 1 Q1 2000 66013
# 2 Q2 2000 69182
# 3 Q3 2000 53175
# 4 Q4 2000 21001
# 5 Q1 2001 46036
# 6 Q2 2001 58842
The default direction "down" carries the last observed non-NA value forward into subsequent rows. This is the most common fill pattern and is known as LOCF, which stands for last observation carried forward in longitudinal data analysis of repeated measurements.
# Fill upward — propagate the next value upward (NOCB — next observation carried backward)
df |> fill(year, .direction = "up")
Filling upward instead carries values backward through the column, pulling the next available non-NA value up into any preceding gaps that exist in the rows above it. This upward direction is most useful when a footer or summary row contains metadata that should apply to all rows appearing above it in the table or report layout you are processing.
# Fill down first, then fill remaining NAs upward
df |> fill(year, .direction = "downup")
The "downup" option applies a two-pass strategy: it first fills downward to propagate known values into subsequent rows, then fills upward to handle any leading NAs that were never reached by the downward pass alone. This ensures complete coverage of all missing values in the column.
# Fill up first, then fill remaining NAs downward
df |> fill(year, .direction = "updown")
The "updown" direction reverses the two-pass order by filling upward first and then downward, which is appropriate when the primary known value is at the bottom of a group and you want it to spread upward before filling any gaps below it.
Filling by group
The .by argument fills within groups without permanently grouping the data. This is cleaner than using dplyr::group_by() because you don’t need to ungroup afterward:
library(tidyr)
library(dplyr)
df <- tibble(
group = c("A", "A", "A", "B", "B", "B"),
x = c(1, NA, NA, 2, NA, NA),
y = c("p", NA, "r", "q", NA, "s")
)
df
# # A tibble: 6 × 3
# group x y
# <chr> <dbl> <chr>
# 1 A 1 p
# 2 A NA NA
# 3 A NA r
# 4 B 2 q
# 5 B NA NA
# 6 B NA s
df |> fill(x, y, .by = group)
# # A tibble: 6 × 3
# group x y
# <chr> <dbl> <chr>
# 1 A 1 p
# 2 A 1 p
# 3 A 1 r
# 4 B 2 q
# 5 B 2 q
# 6 B 2 s
Without .by, the fill would cross group boundaries and copy values from group A into group B, which is almost never the desired behavior. The .by argument is the modern alternative to group_by() followed by ungroup() and keeps your pipeline concise.
Filling all columns
Use everything() to fill every column with NAs at once:
df <- tibble(
x = c(1, NA, 3),
y = c("a", NA, "c"),
z = c(TRUE, NA, FALSE)
)
df |> fill(everything(), .direction = "down")
# # A tibble: 3 × 3
# x y z
# <dbl> <chr> <lgl>
# 1 1 a TRUE
# 2 1 a TRUE
# 3 3 c FALSE
Using everything() as the column selector applies the fill to every column in the data frame in a single call. This is convenient when you have a dataset where missing values appear across many columns and you want to apply the same fill direction to all of them at once.
Chaining directions for complex patterns
If you need trailing NAs filled too, chain fill() calls:
df <- tibble(
id = c(1, 1, 1, 1),
value = c("a", NA, NA, NA)
)
# Chain down then up to fill both leading and trailing NAs
df |>
fill(value, .direction = "down") |>
fill(value, .direction = "up")
# # A tibble: 4 × 2
# id value
# <dbl> <chr>
# 1 1 a
# 2 1 a
# 3 1 a
# 4 1 a <- trailing NA also filled
Chaining fill() calls first downward and then upward is a manual way to achieve the effect of .direction = "downup". The explicit chain makes the two-pass logic visible in the code, which can be easier to debug when you are uncertain about which direction should take priority.
This is equivalent to .direction = "downup".
fill() propagates non-missing values to adjacent NA positions. The .direction argument accepts "down" (default), "up", "downup" (down first then up), and "updown". It is commonly used after pivot_longer() when categories are recorded only on the first row of a group. Always group_by() the relevant identifier columns before calling fill() to prevent values from spilling across logical groups.
Direction and scope
fill() fills within a column, not across columns. The .direction argument controls whether the fill propagates downward (the default, “down”), upward (“up”), or in both directions (“downup” or “updown”). Filling downward is appropriate for data where a header row applies to subsequent detail rows. Filling upward handles formats where a footer value applies to preceding rows.
When used with group_by(), filling does not cross group boundaries. Each group fills independently, so a missing value at the start of a group is not filled from the previous group’s last value. This boundary behavior is correct for most use cases: you want census values for 2020 to fill rows within the 2020 group, not carry over into the 2021 group. Always verify fill behavior at group boundaries when the grouping is meaningful to the analysis.
See also
- tidyr::pivot_longer(), reshape wide data to long format, often a precursor to filling
- tidyr::replace_na(), replace NAs with a fixed value instead of adjacent values
- tidyr::separate(), split a single column into multiple columns