rguides

unite

Overview

unite() combines multiple columns into a single new column by pasting the values together as strings. It is the inverse operation of separate(). The function lives in the tidyr package.

Signature

unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)

Parameters

  • data — A data frame or tibble. This is the only required argument.
  • col — The name of the new column to create, given as a string or unquoted symbol.
  • ... — Columns to unite, specified via tidy-select helpers (e.g., starts_with("x"), contains("name"), everything(), or bare column names).
  • sep — Separator inserted between values. Defaults to "_". Can be any string, including an empty string "".
  • remove — If TRUE (the default), the original columns used in ... are removed from the output. If FALSE, those columns are kept alongside the new column.
  • na.rm — If FALSE (the default), NA values are converted to the literal string "NA" and included in the output. If TRUE, NA values are removed entirely and no separator is added for the missing position.

Returns

A data frame or tibble with the same number of rows as the input. The new col column is a character vector. All other columns pass through unchanged, subject to the remove argument.

Examples

Basic usage

library(tidyverse)

df <- tibble(year = c(2020, 2021), month = c("Jan", "Feb"), day = c("01", "02"))
unite(df, "date", year, month, day)
# # A tibble: 2 × 1
#   date
#   <chr>
# 1 2020_Jan_01
# 2 2021_Feb_02

Custom separator

unite(df, "date", year, month, day, sep = "-")
# # A tibble: 2 × 1
#   date
#   <chr>
# 1 2020-Jan-01
# 2 2021-Feb-02

Keep original columns with remove = FALSE

unite(df, "date", year, month, day, sep = "-", remove = FALSE)
# # A tibble: 2 × 4
#   date        year month day
#   <chr>      <dbl> <chr> <chr>
# 1 2020-Jan-01  2020 Jan   01
# 2 2021-Feb-02  2021 Feb   02

Handle NA values with na.rm = TRUE

df_na <- tibble(a = c("x", NA), b = c("y", "z"))
unite(df_na, "combined", a, b, sep = "-")
# # A tibble: 2 × 1
#   combined
#   <chr>
# 1 x_y
# 2 NA_z

unite(df_na, "combined", a, b, sep = "-", na.rm = TRUE)
# # A tibble: 2 × 1
#   combined
#   <chr>
# 1 x_y
# 2 z

All-NA row with na.rm = TRUE

df_all_na <- tibble(a = c("x", NA), b = c(NA, NA))
unite(df_all_na, "combined", a, b, sep = "-", na.rm = TRUE)
# # A tibble: 2 × 1
#   combined
#   <chr>
# 1 x
# 2

When all united columns contain NA for a given row and na.rm = TRUE, the result is an empty string "" rather than NA.

Tidy-select helpers

df_multi <- tibble(x_a = 1, x_b = 2, y_a = 3, y_b = 4)
unite(df_multi, "x_cols", starts_with("x"), sep = "")
# # A tibble: 1 × 3
#   x_cols     y_a   y_b
#   <chr>   <dbl> <dbl>
# 1 12          3     4

Behavior Details

NA conversion

na.rm = FALSE (the default) converts NA to the literal string "NA". This is consistent with how tidyr::separate() treats the output of unite(). Be careful when chaining unite() and separate() — the round-trip changes NA to "NA" if you do not set na.rm = TRUE in unite().

Type coercion

Non-character columns are coerced to character using as.character() before concatenation. Factors, dates, and numbers all work, but the resulting column is always a character vector.

No into parameter

Unlike separate(), which uses into = c("col_a", "col_b") to name multiple new columns, unite() takes a single column name via col. Passing into = ... to unite() will raise an error about an unused argument.

Separator placement with na.rm = TRUE

When na.rm = TRUE and a value is NA, no separator is added for that position. If the first column in ... is NA and na.rm = TRUE, the result will start with the value from the second column with no leading separator.

See Also