unite
Overview
unite() combines multiple columns into a single new column by pasting the values together as strings. It is the inverse operation of separate(). The function lives in the tidyr package.
Signature
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
Parameters
data— A data frame or tibble. This is the only required argument.col— The name of the new column to create, given as a string or unquoted symbol....— Columns to unite, specified via tidy-select helpers (e.g.,starts_with("x"),contains("name"),everything(), or bare column names).sep— Separator inserted between values. Defaults to"_". Can be any string, including an empty string"".remove— IfTRUE(the default), the original columns used in...are removed from the output. IfFALSE, those columns are kept alongside the new column.na.rm— IfFALSE(the default),NAvalues are converted to the literal string"NA"and included in the output. IfTRUE,NAvalues are removed entirely and no separator is added for the missing position.
Returns
A data frame or tibble with the same number of rows as the input. The new col column is a character vector. All other columns pass through unchanged, subject to the remove argument.
Examples
Basic usage
library(tidyverse)
df <- tibble(year = c(2020, 2021), month = c("Jan", "Feb"), day = c("01", "02"))
unite(df, "date", year, month, day)
# # A tibble: 2 × 1
# date
# <chr>
# 1 2020_Jan_01
# 2 2021_Feb_02
Custom separator
unite(df, "date", year, month, day, sep = "-")
# # A tibble: 2 × 1
# date
# <chr>
# 1 2020-Jan-01
# 2 2021-Feb-02
Keep original columns with remove = FALSE
unite(df, "date", year, month, day, sep = "-", remove = FALSE)
# # A tibble: 2 × 4
# date year month day
# <chr> <dbl> <chr> <chr>
# 1 2020-Jan-01 2020 Jan 01
# 2 2021-Feb-02 2021 Feb 02
Handle NA values with na.rm = TRUE
df_na <- tibble(a = c("x", NA), b = c("y", "z"))
unite(df_na, "combined", a, b, sep = "-")
# # A tibble: 2 × 1
# combined
# <chr>
# 1 x_y
# 2 NA_z
unite(df_na, "combined", a, b, sep = "-", na.rm = TRUE)
# # A tibble: 2 × 1
# combined
# <chr>
# 1 x_y
# 2 z
All-NA row with na.rm = TRUE
df_all_na <- tibble(a = c("x", NA), b = c(NA, NA))
unite(df_all_na, "combined", a, b, sep = "-", na.rm = TRUE)
# # A tibble: 2 × 1
# combined
# <chr>
# 1 x
# 2
When all united columns contain NA for a given row and na.rm = TRUE, the result is an empty string "" rather than NA.
Tidy-select helpers
df_multi <- tibble(x_a = 1, x_b = 2, y_a = 3, y_b = 4)
unite(df_multi, "x_cols", starts_with("x"), sep = "")
# # A tibble: 1 × 3
# x_cols y_a y_b
# <chr> <dbl> <dbl>
# 1 12 3 4
Behavior Details
NA conversion
na.rm = FALSE (the default) converts NA to the literal string "NA". This is consistent with how tidyr::separate() treats the output of unite(). Be careful when chaining unite() and separate() — the round-trip changes NA to "NA" if you do not set na.rm = TRUE in unite().
Type coercion
Non-character columns are coerced to character using as.character() before concatenation. Factors, dates, and numbers all work, but the resulting column is always a character vector.
No into parameter
Unlike separate(), which uses into = c("col_a", "col_b") to name multiple new columns, unite() takes a single column name via col. Passing into = ... to unite() will raise an error about an unused argument.
Separator placement with na.rm = TRUE
When na.rm = TRUE and a value is NA, no separator is added for that position. If the first column in ... is NA and na.rm = TRUE, the result will start with the value from the second column with no leading separator.
See Also
- separate — the inverse operation, splitting one column into multiple columns.
- dplyr::mutate() — transform columns in combination with
unite()in a pipeline. - pivot_longer() and pivot_wider() — reshape data;
unite()is often useful alongside pivoting.