separate()
Overview
separate() splits a character column into multiple columns along a delimiter pattern. It is the inverse of unite(). The function takes a single character vector and breaks it apart based on a separator, distributing the pieces across new columns you name in advance.
The default sep pattern matches any run of non-alphanumeric characters as the split point. This means common delimiters like spaces, hyphens, underscores, slashes, and dots all work without you needing to specify them explicitly.
Signature
separate(
data,
col,
into,
sep = "[^[:alnum:]]+",
remove = TRUE,
convert = FALSE,
extra = "warn",
fill = "warn",
...
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data | tibble / data frame | — | Input data frame or tibble. |
col | character | — | Name of the column to separate. |
into | character vector | — | Names for the new columns. Required — no default. |
sep | character | [^[:alnum:]]+ | Regular expression defining the split point. Defaults to any run of non-alphanumeric characters. |
remove | logical | TRUE | If TRUE, remove the original col from the output. |
convert | logical | FALSE | If TRUE, apply type.convert() to the new columns so they take appropriate R types (integer, numeric, logical, etc.). |
extra | character | "warn" | What to do when a row has more pieces than length(into): "warn" (warn and merge extras into last column), "drop" (silently discard extras), or "merge" (merge without warning). |
fill | character | "warn" | What to do when a row has fewer pieces than length(into): "warn" (warn and fill with NA on the right), "right" (fill on the right), or "left" (fill on the left). |
... | additional arguments | — | Passed to methods. |
Basic Usage
Simple split on default separator
library(tidyr)
df <- tibble(full_name = c("John Doe", "Jane Smith", "Bob Wilson"))
separate(df, full_name, into = c("first", "last"))
# # A tibble: 3 × 2
# first last
# <chr> <chr>
# 1 John Doe
# 2 Jane Smith
# 3 Bob Wilson
Custom separator
When your delimiter is not a non-alphanumeric run, pass sep explicitly.
df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))
separate(df, date, into = c("year", "month", "day"), sep = "-")
# # A tibble: 3 × 3
# year month day
# <chr> <chr> <chr>
# 1 2024 01 15
# 2 2024 02 20
# 3 2024 03 25
Convert types automatically
Set convert = TRUE to coerce the new columns to their natural types.
df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))
separate(df, date, into = c("year", "month", "day"), sep = "-", convert = TRUE)
# # A tibble: 3 × 3
# year month day
# <int> <int> <int>
# 1 2024 1 15
# 2 2024 2 20
# 3 2024 3 25
Gotchas and Advanced
Handling extra pieces
With the default extra = "warn", overflow pieces are merged into the final column and a warning is emitted.
df <- tibble(id = c("a-b-c", "x-y"))
separate(df, id, into = c("first", "second"))
# Warning: Expected 2 pieces. Additional pieces discarded in 1 row (a-b-c).
# # A tibble: 2 × 2
# first second
# <chr> <chr>
# 1 a b-c
# 2 x y
To discard extras silently, use extra = "drop". To merge without warning, use extra = "merge".
Handling too few pieces
Use fill = "right" or fill = "left" to control which side receives NA when there are not enough pieces.
df <- tibble(id = c("only-one", "also-one"))
separate(df, id, into = c("first", "second"), fill = "left")
# Warning: Expected 2 pieces. Missing pieces filled with `NA` on the left.
# # A tibble: 2 × 2
# first second
# <chr> <chr>
# 1 NA only-one
# 2 NA also-one
Negative separator positions
You can specify sep as a negative integer to count from the right. A value of -1 splits one position from the end.
df <- tibble(code = c("abc123def", "xyz789uvw"))
separate(df, code, into = c("prefix", "suffix"), sep = -3)
# # A tibble: 2 × 2
# prefix suffix
# <chr> <chr>
# 1 abc123 def
# 2 xyz789 uvw
NA values propagate to all output columns
If the input cell is NA, every new column receives NA in that row.
df <- tibble(pair = c("apple-orange", NA, "red-blue"))
separate(df, pair, into = c("a", "b"))
# # A tibble: 3 × 2
# a b
# <chr> <chr>
# 1 apple orange
# 2 NA NA
# 3 red blue
Omitting a column with NA in into
If one of the names in into is NA, that column is silently dropped from the output.
df <- tibble(code = c("2024-01-15", "2024-02-20"))
separate(df, code, into = c("year", NA, "day"), sep = "-")
# # A tibble: 2 × 2
# year day
# <chr> <chr>
# 1 2024 15
# 2 2024 20
See Also
stringr::str_sub()— extract or replace substrings by position; useful after separating to pull out specific parts of a string.lubridate::ymd()— parse a date string in ymd format; often used alongsideseparate()when splitting date columns.