rguides

separate()

Overview

separate() splits a character column into multiple columns along a delimiter pattern. It is the inverse of unite(). The function takes a single character vector and breaks it apart based on a separator, distributing the pieces across new columns you name in advance.

The default sep pattern matches any run of non-alphanumeric characters as the split point. This means common delimiters like spaces, hyphens, underscores, slashes, and dots all work without you needing to specify them explicitly.

Signature

separate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  extra = "warn",
  fill = "warn",
  ...
)

Parameters

ParameterTypeDefaultDescription
datatibble / data frame,Input data frame or tibble.
colcharacter,Name of the column to separate.
intocharacter vector,Names for the new columns. Required, no default.
sepcharacter[^[:alnum:]]+Regular expression defining the split point. Defaults to any run of non-alphanumeric characters.
removelogicalTRUEIf TRUE, remove the original col from the output.
convertlogicalFALSEIf TRUE, apply type.convert() to the new columns so they take appropriate R types (integer, numeric, logical, etc.).
extracharacter"warn"What to do when a row has more pieces than length(into): "warn" (warn and merge extras into last column), "drop" (silently discard extras), or "merge" (merge without warning).
fillcharacter"warn"What to do when a row has fewer pieces than length(into): "warn" (warn and fill with NA on the right), "right" (fill on the right), or "left" (fill on the left).
...additional arguments,Passed to methods.

Basic usage

Simple split on default separator

library(tidyr)

df <- tibble(full_name = c("John Doe", "Jane Smith", "Bob Wilson"))

separate(df, full_name, into = c("first", "last"))
# # A tibble: 3 × 2
#   first last
#   <chr> <chr>
# 1 John  Doe
# 2 Jane  Smith
# 3 Bob   Wilson

By default, separate() splits on any run of non-alphanumeric characters, which works for common delimiters like spaces, commas, or semicolons. The into argument names the output columns and must have exactly as many names as the expected number of pieces.

Custom separator

When your delimiter is not a non-alphanumeric run, pass sep explicitly.

df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))

separate(df, date, into = c("year", "month", "day"), sep = "-")
# # A tibble: 3 × 3
#   year  month day
#   <chr> <chr> <chr>
# 1 2024  01    15
# 2 2024  02    20
# 3 2024  03    25

When your data uses a specific separator like a hyphen or forward slash, passing sep explicitly avoids ambiguity and guarantees the split happens exactly where you expect. This is essential for structured formats like dates, file paths, or encoded identifiers.

Convert types automatically

Set convert = TRUE to coerce the new columns to their natural types.

df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))

separate(df, date, into = c("year", "month", "day"), sep = "-", convert = TRUE)
# # A tibble: 3 × 3
#   year  month   day
#   <int>   <int> <int>
# 1  2024       1    15
# 2  2024       2    20
# 3  2024       3    25

Setting convert = TRUE tells separate() to automatically detect and cast each new column to its natural type, turning numeric-looking strings into integers and leaving character strings as they are. This saves you from writing additional mutate() calls with as.numeric() after the split.

Gotchas and advanced

Handling extra pieces

With the default extra = "warn", overflow pieces are merged into the final column and a warning is emitted.

df <- tibble(id = c("a-b-c", "x-y"))

separate(df, id, into = c("first", "second"))
# Warning: Expected 2 pieces. Additional pieces discarded in 1 row (a-b-c).
# # A tibble: 2 × 2
#   first second
#   <chr> <chr>
# 1 a     b-c
# 2 x     y

To discard extras silently, use extra = "drop". To merge without warning, use extra = "merge".

When rows have fewer pieces than the columns you specified in into, the function can fill the missing positions with NA on either the left or right side of the output.

Handling too few pieces

Use fill = "right" or fill = "left" to control which side receives NA when there are not enough pieces.

df <- tibble(id = c("only-one", "also-one"))

separate(df, id, into = c("first", "second"), fill = "left")
# Warning: Expected 2 pieces. Missing pieces filled with `NA` on the left.
# # A tibble: 2 × 2
#   first second
#   <chr> <chr>
# 1 NA    only-one
# 2 NA    also-one

The fill argument controls where NA values are placed when a row does not contain enough separator-delimited pieces to populate all the output columns named in into.

Negative separator positions

You can specify sep as a negative integer to count from the right. A value of -1 splits one position from the end.

df <- tibble(code = c("abc123def", "xyz789uvw"))

separate(df, code, into = c("prefix", "suffix"), sep = -3)
# # A tibble: 2 × 2
#   prefix suffix
#   <chr>  <chr>
# 1 abc123 def
# 2 xyz789 uvw

Using a negative integer for sep splits the string at a fixed number of characters from the right end, which is helpful when you want to extract a suffix of known length like a file extension or a two-letter country code.

NA values propagate to all output columns

If the input cell is NA, every new column receives NA in that row.

df <- tibble(pair = c("apple-orange", NA, "red-blue"))

separate(df, pair, into = c("a", "b"))
# # A tibble: 3 × 2
#   a     b
#   <chr> <chr>
# 1 apple orange
# 2 NA    NA
# 3 red   blue

When the input cell contains NA, separate() propagates that missingness to every output column, preserving the information that the original value was unavailable in all derived fields.

Omitting a column with NA in into

If one of the names in into is NA, that column is silently dropped from the output.

df <- tibble(code = c("2024-01-15", "2024-02-20"))

separate(df, code, into = c("year", NA, "day"), sep = "-")
# # A tibble: 2 × 2
#   year  day
#   <chr> <chr>
# 1 2024  15
# 2 2024  20

separate() splits a single character column into multiple columns using a separator or a regex pattern. When sep is a number rather than a string, it is interpreted as a character position to split at rather than a separator character. By default, the original column is removed; set remove = FALSE to keep it alongside the new columns.

A common issue is that some rows contain more parts than expected (e.g., an address field with commas in the value). Use extra = "drop" to silently discard extra pieces or extra = "merge" to keep them in the last column. Conversely, rows with too few separators produce NA in the trailing columns by default; use fill = "left" or fill = "right" to shift NA to one side. For the inverse operation, combining multiple columns into one, use unite().

See also