dplyr::select

Updated April 1, 2026 · Tidyverse

r dplyr tidyverse data-manipulation

select() picks columns from a data frame or tibble. It is one of the most frequently used verbs in the tidyverse because narrowing your data to the columns you need makes downstream code easier to read and faster to run.

Function signature

select(.data, ...)

The .data argument accepts a data frame or tibble. The ... argument takes column names — either bare names, tidyselect helper calls, or combinations of both. Multiple selection expressions are combined with AND logic, so a column must match all conditions to be included.

Basic column selection

Name columns directly to select them:

library(dplyr)

df <- tibble(
  id = 1:4,
  name = c("Mia", "Jon", "Lea", "Sam"),
  age = c(29, 45, 33, 51),
  email = c("mia@example.com", "jon@example.com", "lea@example.com", "sam@example.com")
)

# Select three columns by name
df %>% select(id, name, age)
# # A tibble: 4 x 3
#      id name   age
#   <int> <chr> <dbl>
# 1     1 Mia      29
# 2     2 Jon      45
# 3     3 Lea      33
# 4     4 Sam      51

Tidyselect helpers

Selection helpers extend select() with pattern matching and conditional logic. They work inside select() and other tidyselect-aware functions like relocate() and across().

Helper	What it selects
`starts_with(x)`	Columns whose name starts with prefix `x`
`ends_with(x)`	Columns whose name ends with suffix `x`
`contains(x)`	Columns whose name contains the string `x`
`matches(x)`	Columns matching a regex (requires a quoted string)
`where(fn)`	Columns where `fn(column)` returns `TRUE`
`all_of(x)`	Columns named in a character vector (errors if any is missing)
`any_of(x)`	Columns named in a character vector (warns if any is missing)

# Select columns whose name starts with "n"
df %>% select(starts_with("n"))
# # A tibble: 4 x 1
#      name
#   <chr>
# 1 Mia
# 2 Jon
# 3 Lea
# 4 Sam

# Combine helpers — name columns AND numeric columns
df %>% select(name, where(is.numeric))
# # A tibble: 4 x 2
#      id   age
#   <int> <dbl>
# 1     1    29
# 2     2    45
# 3     3    33
# 4     4    51

Excluding and reordering columns

Prefix a column name with - to deselect it:

# Drop the email column
df %>% select(-email)
# # A tibble: 4 x 3
#      id name   age
#   <int> <chr> <dbl>
# 1     1 Mia      29
# 2     2 Jon      45
# 3     3 Lea      33
# 4     4 Sam      51

# Exclude multiple columns
df %>% select(-id, -email)
# # A tibble: 4 x 2
#      name   age
#   <chr> <dbl>
# 1 Mia      29
# 2 Jon      45
# 3 Lea      33
# 4 Sam      51

Programmatic selection with `all_of()` and `any_of()`

When column names live in a variable, bare names do not work the same way — R evaluates them syntactically rather than looking up the variable’s value. Use all_of() or any_of() to pass column names programmatically:

cols <- c("name", "age")

# all_of() errors if any column in cols is missing from df
df %>% select(all_of(cols))
# # A tibble: 4 x 2
#      name   age
#   <chr> <dbl>
# 1 Mia      29
# 2 Jon      45
# 3 Lea      33
# 4 Sam      51

# any_of() silently skips columns that are not present
df %>% select(any_of(cols))
# # A tibble: 4 x 2
#      name   age
#   <chr> <dbl>
# 1 Mia      29
# 2 Jon      45
# 3 Lea      33
# 4 Sam      51

This matters when you are writing functions that must handle data sets with varying column sets. Prefer any_of() inside functions that process multiple data sets.

Renaming within `select()`

You can rename a column as part of a select() call using new_name = old_name syntax. Note that this drops all columns not explicitly mentioned:

# Rename and keep only the named columns
df %>% select(full_name = name, years = age)
# # A tibble: 4 x 2
#   full_name years
#   <chr>     <dbl>
# 1 Mia          29
# 2 Jon          45
# 3 Lea          33
# 4 Sam          51

If you only want to rename without changing which columns are present, use rename() instead. If you only want to reorder columns without renaming, use relocate().

Common gotchas

select() always returns a tibble, even when you select a single column. Use pull() after select() if you need a bare vector:

# select() returns a 1-column tibble, not a vector
df %>% select(age) %>% class()
# [1] "tbl_df" "tbl"    "data.frame"

# pull() extracts the column as a vector
df %>% pull(age)
# [1] 29 45 33 51

Watch out for conflicts with MASS::select(). If you load both dplyr and MASS, the MASS function masks dplyr’s version. Qualify the call explicitly with dplyr::select() to avoid ambiguity, or load dplyr after MASS.

The matches() helper requires a quoted regex string. matches("^id$") works; matches(^id$) does not — the regex must be a character string.