separate()
Overview
separate() splits a character column into multiple columns along a delimiter pattern. It is the inverse of unite(). The function takes a single character vector and breaks it apart based on a separator, distributing the pieces across new columns you name in advance.
The default sep pattern matches any run of non-alphanumeric characters as the split point. This means common delimiters like spaces, hyphens, underscores, slashes, and dots all work without you needing to specify them explicitly.
Signature
separate(
data,
col,
into,
sep = "[^[:alnum:]]+",
remove = TRUE,
convert = FALSE,
extra = "warn",
fill = "warn",
...
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data | tibble / data frame | , | Input data frame or tibble. |
col | character | , | Name of the column to separate. |
into | character vector | , | Names for the new columns. Required, no default. |
sep | character | [^[:alnum:]]+ | Regular expression defining the split point. Defaults to any run of non-alphanumeric characters. |
remove | logical | TRUE | If TRUE, remove the original col from the output. |
convert | logical | FALSE | If TRUE, apply type.convert() to the new columns so they take appropriate R types (integer, numeric, logical, etc.). |
extra | character | "warn" | What to do when a row has more pieces than length(into): "warn" (warn and merge extras into last column), "drop" (silently discard extras), or "merge" (merge without warning). |
fill | character | "warn" | What to do when a row has fewer pieces than length(into): "warn" (warn and fill with NA on the right), "right" (fill on the right), or "left" (fill on the left). |
... | additional arguments | , | Passed to methods. |
Basic usage
Simple split on default separator
library(tidyr)
df <- tibble(full_name = c("John Doe", "Jane Smith", "Bob Wilson"))
separate(df, full_name, into = c("first", "last"))
# # A tibble: 3 × 2
# first last
# <chr> <chr>
# 1 John Doe
# 2 Jane Smith
# 3 Bob Wilson
By default, separate() splits on any run of non-alphanumeric characters, which works for common delimiters like spaces, commas, or semicolons. The into argument names the output columns and must have exactly as many names as the expected number of pieces.
Custom separator
When your delimiter is not a non-alphanumeric run, pass sep explicitly.
df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))
separate(df, date, into = c("year", "month", "day"), sep = "-")
# # A tibble: 3 × 3
# year month day
# <chr> <chr> <chr>
# 1 2024 01 15
# 2 2024 02 20
# 3 2024 03 25
When your data uses a specific separator like a hyphen or forward slash, passing sep explicitly avoids ambiguity and guarantees the split happens exactly where you expect. This is essential for structured formats like dates, file paths, or encoded identifiers.
Convert types automatically
Set convert = TRUE to coerce the new columns to their natural types.
df <- tibble(date = c("2024-01-15", "2024-02-20", "2024-03-25"))
separate(df, date, into = c("year", "month", "day"), sep = "-", convert = TRUE)
# # A tibble: 3 × 3
# year month day
# <int> <int> <int>
# 1 2024 1 15
# 2 2024 2 20
# 3 2024 3 25
Setting convert = TRUE tells separate() to automatically detect and cast each new column to its natural type, turning numeric-looking strings into integers and leaving character strings as they are. This saves you from writing additional mutate() calls with as.numeric() after the split.
Gotchas and advanced
Handling extra pieces
With the default extra = "warn", overflow pieces are merged into the final column and a warning is emitted.
df <- tibble(id = c("a-b-c", "x-y"))
separate(df, id, into = c("first", "second"))
# Warning: Expected 2 pieces. Additional pieces discarded in 1 row (a-b-c).
# # A tibble: 2 × 2
# first second
# <chr> <chr>
# 1 a b-c
# 2 x y
To discard extras silently, use extra = "drop". To merge without warning, use extra = "merge".
When rows have fewer pieces than the columns you specified in into, the function can fill the missing positions with NA on either the left or right side of the output.
Handling too few pieces
Use fill = "right" or fill = "left" to control which side receives NA when there are not enough pieces.
df <- tibble(id = c("only-one", "also-one"))
separate(df, id, into = c("first", "second"), fill = "left")
# Warning: Expected 2 pieces. Missing pieces filled with `NA` on the left.
# # A tibble: 2 × 2
# first second
# <chr> <chr>
# 1 NA only-one
# 2 NA also-one
The fill argument controls where NA values are placed when a row does not contain enough separator-delimited pieces to populate all the output columns named in into.
Negative separator positions
You can specify sep as a negative integer to count from the right. A value of -1 splits one position from the end.
df <- tibble(code = c("abc123def", "xyz789uvw"))
separate(df, code, into = c("prefix", "suffix"), sep = -3)
# # A tibble: 2 × 2
# prefix suffix
# <chr> <chr>
# 1 abc123 def
# 2 xyz789 uvw
Using a negative integer for sep splits the string at a fixed number of characters from the right end, which is helpful when you want to extract a suffix of known length like a file extension or a two-letter country code.
NA values propagate to all output columns
If the input cell is NA, every new column receives NA in that row.
df <- tibble(pair = c("apple-orange", NA, "red-blue"))
separate(df, pair, into = c("a", "b"))
# # A tibble: 3 × 2
# a b
# <chr> <chr>
# 1 apple orange
# 2 NA NA
# 3 red blue
When the input cell contains NA, separate() propagates that missingness to every output column, preserving the information that the original value was unavailable in all derived fields.
Omitting a column with NA in into
If one of the names in into is NA, that column is silently dropped from the output.
df <- tibble(code = c("2024-01-15", "2024-02-20"))
separate(df, code, into = c("year", NA, "day"), sep = "-")
# # A tibble: 2 × 2
# year day
# <chr> <chr>
# 1 2024 15
# 2 2024 20
separate() splits a single character column into multiple columns using a separator or a regex pattern. When sep is a number rather than a string, it is interpreted as a character position to split at rather than a separator character. By default, the original column is removed; set remove = FALSE to keep it alongside the new columns.
A common issue is that some rows contain more parts than expected (e.g., an address field with commas in the value). Use extra = "drop" to silently discard extra pieces or extra = "merge" to keep them in the last column. Conversely, rows with too few separators produce NA in the trailing columns by default; use fill = "left" or fill = "right" to shift NA to one side. For the inverse operation, combining multiple columns into one, use unite().
See also
- tidyr::unite(), the inverse operation, combining multiple columns into one
- stringr::str_sub(), extract or replace substrings by position
- tidyr::pivot_longer(), reshape wide data to long format