dplyr::select
select() picks columns from a data frame or tibble. It is one of the most frequently used verbs in the tidyverse because narrowing your data to the columns you need makes downstream code easier to read and faster to run.
Function signature
select(.data, ...)
The .data argument accepts a data frame or tibble. The ... argument takes column names — either bare names, tidyselect helper calls, or combinations of both. Multiple selection expressions are combined with AND logic, so a column must match all conditions to be included.
Basic column selection
Name columns directly to select them:
library(dplyr)
df <- tibble(
id = 1:4,
name = c("Mia", "Jon", "Lea", "Sam"),
age = c(29, 45, 33, 51),
email = c("mia@example.com", "jon@example.com", "lea@example.com", "sam@example.com")
)
# Select three columns by name
df %>% select(id, name, age)
# # A tibble: 4 x 3
# id name age
# <int> <chr> <dbl>
# 1 1 Mia 29
# 2 2 Jon 45
# 3 3 Lea 33
# 4 4 Sam 51
Tidyselect helpers
Selection helpers extend select() with pattern matching and conditional logic. They work inside select() and other tidyselect-aware functions like relocate() and across().
| Helper | What it selects |
|---|---|
starts_with(x) | Columns whose name starts with prefix x |
ends_with(x) | Columns whose name ends with suffix x |
contains(x) | Columns whose name contains the string x |
matches(x) | Columns matching a regex (requires a quoted string) |
where(fn) | Columns where fn(column) returns TRUE |
all_of(x) | Columns named in a character vector (errors if any is missing) |
any_of(x) | Columns named in a character vector (warns if any is missing) |
# Select columns whose name starts with "n"
df %>% select(starts_with("n"))
# # A tibble: 4 x 1
# name
# <chr>
# 1 Mia
# 2 Jon
# 3 Lea
# 4 Sam
# Combine helpers — name columns AND numeric columns
df %>% select(name, where(is.numeric))
# # A tibble: 4 x 2
# id age
# <int> <dbl>
# 1 1 29
# 2 2 45
# 3 3 33
# 4 4 51
Excluding and reordering columns
Prefix a column name with - to deselect it:
# Drop the email column
df %>% select(-email)
# # A tibble: 4 x 3
# id name age
# <int> <chr> <dbl>
# 1 1 Mia 29
# 2 2 Jon 45
# 3 3 Lea 33
# 4 4 Sam 51
# Exclude multiple columns
df %>% select(-id, -email)
# # A tibble: 4 x 2
# name age
# <chr> <dbl>
# 1 Mia 29
# 2 Jon 45
# 3 Lea 33
# 4 Sam 51
Programmatic selection with all_of() and any_of()
When column names live in a variable, bare names do not work the same way — R evaluates them syntactically rather than looking up the variable’s value. Use all_of() or any_of() to pass column names programmatically:
cols <- c("name", "age")
# all_of() errors if any column in cols is missing from df
df %>% select(all_of(cols))
# # A tibble: 4 x 2
# name age
# <chr> <dbl>
# 1 Mia 29
# 2 Jon 45
# 3 Lea 33
# 4 Sam 51
# any_of() silently skips columns that are not present
df %>% select(any_of(cols))
# # A tibble: 4 x 2
# name age
# <chr> <dbl>
# 1 Mia 29
# 2 Jon 45
# 3 Lea 33
# 4 Sam 51
This matters when you are writing functions that must handle data sets with varying column sets. Prefer any_of() inside functions that process multiple data sets.
Renaming within select()
You can rename a column as part of a select() call using new_name = old_name syntax. Note that this drops all columns not explicitly mentioned:
# Rename and keep only the named columns
df %>% select(full_name = name, years = age)
# # A tibble: 4 x 2
# full_name years
# <chr> <dbl>
# 1 Mia 29
# 2 Jon 45
# 3 Lea 33
# 4 Sam 51
If you only want to rename without changing which columns are present, use rename() instead. If you only want to reorder columns without renaming, use relocate().
Common gotchas
select() always returns a tibble, even when you select a single column. Use pull() after select() if you need a bare vector:
# select() returns a 1-column tibble, not a vector
df %>% select(age) %>% class()
# [1] "tbl_df" "tbl" "data.frame"
# pull() extracts the column as a vector
df %>% pull(age)
# [1] 29 45 33 51
Watch out for conflicts with MASS::select(). If you load both dplyr and MASS, the MASS function masks dplyr’s version. Qualify the call explicitly with dplyr::select() to avoid ambiguity, or load dplyr after MASS.
The matches() helper requires a quoted regex string. matches("^id$") works; matches(^id$) does not — the regex must be a character string.
See also
- /reference/tidyverse/dplyr-filter/ —
filter()picks rows;select()picks columns. They are often used together in a pipeline. - /reference/tidyverse/dplyr-rename/ — rename columns without dropping any others.
- /reference/tidyverse/dplyr-mutate/ — add or transform columns, often used after narrowing data with
select().