rguides

separate()

Overview

separate() splits a character column into multiple columns along a delimiter pattern. It is the inverse of unite(). The function takes a single character vector and breaks it apart based on a separator, distributing the pieces across new columns you name in advance.

The default sep pattern matches any run of non-alphanumeric characters as the split point. This means common delimiters like spaces, hyphens, underscores, slashes, and dots all work without you needing to specify them explicitly.

Signature

separate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  extra = "warn",
  fill = "warn",
  ...
)

Parameters

ParameterTypeDefaultDescription
datatibble / data frameInput data frame or tibble.
colcharacterName of the column to separate.
intocharacter vectorNames for the new columns. Required — no default.
sepcharacter[^[:alnum:]]+Regular expression defining the split point. Defaults to any run of non-alphanumeric characters.
removelogicalTRUEIf TRUE, remove the original col from the output.
convertlogicalFALSEIf TRUE, apply type.convert() to the new columns so they take appropriate R types (integer, numeric, logical, etc.).
extracharacter"warn"What to do when a row has more pieces than length(into): "warn" (warn and merge extras into last column), "drop" (silently discard extras), or "merge" (merge without warning).
fillcharacter"warn"What to do when a row has fewer pieces than length(into): "warn" (warn and fill with NA on the right), "right" (fill on the right), or "left" (fill on the left).
...additional argumentsPassed to methods.

Basic Usage

Simple split on default separator

library(tidyr)

df <- tibble(full_name = c("John Doe", "Jane Smith", "Bob Wilson"))

separate(df, full_name, into = c("first", "last"))
# # A tibble: 3 × 2
#   first last
#   <chr> <chr>
# 1 John  Doe
# 2 Jane  Smith
# 3 Bob   Wilson

Custom separator

When your delimiter is not a non-alphanumeric run, pass sep explicitly.

df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))

separate(df, date, into = c("year", "month", "day"), sep = "-")
# # A tibble: 3 × 3
#   year  month day
#   <chr> <chr> <chr>
# 1 2024  01    15
# 2 2024  02    20
# 3 2024  03    25

Convert types automatically

Set convert = TRUE to coerce the new columns to their natural types.

df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))

separate(df, date, into = c("year", "month", "day"), sep = "-", convert = TRUE)
# # A tibble: 3 × 3
#   year  month   day
#   <int>   <int> <int>
# 1  2024       1    15
# 2  2024       2    20
# 3  2024       3    25

Gotchas and Advanced

Handling extra pieces

With the default extra = "warn", overflow pieces are merged into the final column and a warning is emitted.

df <- tibble(id = c("a-b-c", "x-y"))

separate(df, id, into = c("first", "second"))
# Warning: Expected 2 pieces. Additional pieces discarded in 1 row (a-b-c).
# # A tibble: 2 × 2
#   first second
#   <chr> <chr>
# 1 a     b-c
# 2 x     y

To discard extras silently, use extra = "drop". To merge without warning, use extra = "merge".

Handling too few pieces

Use fill = "right" or fill = "left" to control which side receives NA when there are not enough pieces.

df <- tibble(id = c("only-one", "also-one"))

separate(df, id, into = c("first", "second"), fill = "left")
# Warning: Expected 2 pieces. Missing pieces filled with `NA` on the left.
# # A tibble: 2 × 2
#   first second
#   <chr> <chr>
# 1 NA    only-one
# 2 NA    also-one

Negative separator positions

You can specify sep as a negative integer to count from the right. A value of -1 splits one position from the end.

df <- tibble(code = c("abc123def", "xyz789uvw"))

separate(df, code, into = c("prefix", "suffix"), sep = -3)
# # A tibble: 2 × 2
#   prefix suffix
#   <chr>  <chr>
# 1 abc123 def
# 2 xyz789 uvw

NA values propagate to all output columns

If the input cell is NA, every new column receives NA in that row.

df <- tibble(pair = c("apple-orange", NA, "red-blue"))

separate(df, pair, into = c("a", "b"))
# # A tibble: 3 × 2
#   a     b
#   <chr> <chr>
# 1 apple orange
# 2 NA    NA
# 3 red   blue

Omitting a column with NA in into

If one of the names in into is NA, that column is silently dropped from the output.

df <- tibble(code = c("2024-01-15", "2024-02-20"))

separate(df, code, into = c("year", NA, "day"), sep = "-")
# # A tibble: 2 × 2
#   year  day
#   <chr> <chr>
# 1 2024  15
# 2 2024  20

See Also

  • stringr::str_sub() — extract or replace substrings by position; useful after separating to pull out specific parts of a string.
  • lubridate::ymd() — parse a date string in ymd format; often used alongside separate() when splitting date columns.