dplyr::pull

pull(.data, var = -1, name = NULL, ...)

Returns vector· Updated May 29, 2026· Tidyverse

dplyrrtidyversedata-wrangling

pull() extracts a single column from a data frame or tibble and returns it as a vector. It’s the pipe-friendly way to get a column out when you need a vector rather than a one-column tibble.

Signature

pull(.data, var = -1, name = NULL, ...)

Parameters

.data

A data frame, tibble, or lazy data frame (from dbplyr or dtplyr).

var

The column to extract. Accepts:

A bare column name: pull(var), uses data masking (tidy evaluation)
A positive integer, giving the position counting from the left (1 = first column)
A negative integer, giving the position counting from the right (-1 = last column, -2 = second-to-last)

Defaults to -1, the last column.

name

An optional column whose values become names in the output vector:

starwars |> pull(height, name = name)
#>        Luke Skywalker                 C-3PO                 R2-D2 
#>                   172                   167                    96

If NULL (the default), the output is unnamed.

…

Passed to methods for other classes (e.g., dbplyr remote tables).

Return value

A vector. The type depends on the column, numeric column returns numeric vector, character column returns character vector. Unlike $, pull() always returns a vector, never a list or data frame.

Default behavior

pull() defaults to the last column (var = -1). This is useful when you’ve just created a new column with mutate() and want to extract it immediately:

df |>
  mutate(total = a + b + c) |>
  pull()
#>  [1] 10 18 24 28 30 30 28 24 18 10

The mutate() call creates the total column as the last column, then pull() extracts it as a vector without needing to refer to it by name. This convention saves you from typing the column name, which is especially convenient when the column name is generated dynamically or is long and unwieldy.

Extracting by position

pull() accepts both positive and negative integers for column positions. A positive integer counts from the first column on the left, while a negative integer counts from the last column on the right. This makes pulling the penultimate column as simple as pull(-2):

# First column
df |> pull(1)

# Second column from the right
df |> pull(-2)

Unlike [[]] indexing with negative integers — which drops that column from the result — pull() with a negative index consistently counts from the right. This is a deliberate design choice to make the default -1 behavior (last column) extend naturally to -2 (second-to-last) and so on.

Producing named vectors

The name argument attaches names to the output vector, using the values from another column as the name attribute. When you pass name = name_column, each element of the result is labeled with the corresponding value from that column, creating a natural lookup structure:

# Height in cm, named by character
starwars |> pull(height, name = name)
#>        Luke Skywalker                 C-3PO                 R2-D2 
#>                   172                   167                    96

This pattern is particularly useful when you need a dictionary-style lookup table in your R session. The named vector can be indexed by character name to retrieve the associated numeric value in a single expression, which is cleaner than filtering a data frame every time you need a single value:

height_map <- starwars |> pull(height, name = name)
height_map["Luke Skywalker"]
#> Luke Skywalker 
#>             172

pull vs $

pull() and $ both extract a column, but they differ in two ways:

	`pull()`	`$`
Return type	Always a vector	Vector or list (for list-columns)
Pipe usage	Designed for pipes	Requires the data on the left
Data masking	Yes, bare names work	No, needs `df$x`

In a pipe, df |> pull(x) reads more naturally than df$x or df[["x"]].

pull() extracts a single column as a vector, equivalent to df[[col]]. Unlike select(), which returns a data frame, pull() returns the underlying vector. It accepts column positions as well as names: pull(df, 1) extracts the first column. Use pull() at the end of a pipeline when you need the raw values for a function that does not accept a data frame.