Reference

Tidyverse

Common dplyr, purrr, and tibble workflows.

dplyr::*_join()
Join two data frames by matching rows based on key columns. Learn how to use dplyr join functions to combine datasets in R.

left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
dplyr::across()
Apply functions across multiple columns in dplyr: across() modifies columns in place while where() selects columns by type or condition.

across(.cols = everything(), .fns = NULL, ..., .names = NULL)
dplyr::arrange()
Sort rows of a data frame by column values in ascending or descending order using dplyr. arrange() reorder rows of a data frame based on column values.

arrange(.data, ..., .by_group = FALSE)
dplyr::bind_cols
dplyr's bind_cols() combines data frames column-wise, matching rows by position rather than by key. Fast for aligned data, but risky when row orders differ.

bind_cols(..., .name_repair = "unique")
dplyr::bind_rows
Stack two or more data frames by row with dplyr, matching columns by name and filling mismatches with NA. The optional .id column records each row's source.

bind_rows(..., .id = NULL)
dplyr::bind_rows() / dplyr::bind_cols()
Combine data frames by stacking rows or joining columns horizontally. bind_rows() and bind_cols() are dplyr functions for combining data frames.

bind_rows(..., .id = NULL)
dplyr::case_when
Create a new column using vectorized conditional logic with case_when(), handling multiple conditions in order, the dplyr equivalent of SQL's CASE WHEN.

case_when(...)
dplyr::count()
Count the number of observations in each group. These dplyr verbs provide a convenient way to summarise data by grouping variables.

count(x, ..., wt = NULL, sort = FALSE, .drop = TRUE)
dplyr::distinct
Remove duplicate rows from an R data frame with dplyr::distinct(), keeping only unique combinations of specified columns.

distinct(.data, ..., .keep_all = FALSE)
dplyr::filter()
Subset rows of a data frame based on logical conditions using expressive dplyr syntax. keeps rows where all conditions are .

filter(.data, ...)
dplyr::group_by()
Group data by one or more columns with dplyr::group_by() and compute summary statistics using summarise(). Adds grouping metadata to a data frame.

group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))
dplyr::if_else
Type-strict vectorized if-else for R via dplyr::if_else(). Stricter than base ifelse(), handles NAs explicitly, and preserves factor types.

if_else(condition, true, false, missing = NULL, ..., ptype = NULL, size = deprecated())
dplyr::mutate()
Create new columns or modify existing ones in a tibble using dplyr::mutate(). Adds or modifies columns through vectorised operations.

mutate(.data, ..., .by = NULL, .keep = c("all", "used", "unused", "none"), .before = NULL, .after = NULL)
dplyr::pull
Extract a column from a data frame as a vector with dplyr::pull(). Defaults to the last column, supports negative indexing, and can produce named vectors.

pull(.data, var = -1, name = NULL, ...)
dplyr::relocate
Move data frame columns to new positions with dplyr::relocate() using tidy-select syntax, including .before, .after, and renaming during the move.

relocate(.data, ..., .before = NULL, .after = NULL)
dplyr::select()
Select columns from a data frame by name, position, or pattern with dplyr::select(). Supports helpers like starts_with(), ends_with(), and where().

select(.data, ...)
dplyr::slice()
Select rows by position, head, tail, random sampling, or rank using dplyr slice functions. - is very fast; it uses integer indexing.

slice(.data, ...)
dplyr::summarise()
Collapse a tibble to one row per group using summary functions. Use .by (dplyr 1.1+) or group_by(). It accepts bare column names directly: instead of .

summarise(.data, ..., .by = NULL, .sort = FALSE, .na.rm = FALSE)
fct_lump
Collapse uncommon factor levels into an Other category. Covers fct_lump_n, fct_lump_prop, fct_lump_min, and fct_lump_lowfreq.
fct_reorder
Reorder factor levels with fct_reorder() by a summary statistic of a second variable. Reorders levels according to the summary without changing values.

fct_reorder(.f, .x, .fun = median, ..., .na_rm = NULL, .default = Inf, .desc = FALSE)
ggplot2::aes()
Map variables to visual aesthetics in ggplot2. Covers aes(), aes_string(), aes_quosures(), and how column names become plot labels.

aes(x, y, ..., colour, fill, size, shape, alpha, linetype, linewidth)
ggplot2::coord_flip
Flip horizontal and vertical axes in ggplot2. Swaps x and y so horizontal bar charts, boxplots, and histograms display cleanly without re-coding aesthetics.
ggplot2::facet_wrap
Wrap a 1D sequence of panels into a 2D grid with ggplot2. Control nrow, ncol, scales, strip position, and which axes are displayed.
ggplot2::geom_bar()
Draw bars with ggplot2::geom_bar() where height is proportional to count or value. Uses stat_count by default, mapping x to categories and y to frequencies.

geom_bar(mapping = NULL, data = NULL, stat = "count", position = "stack", ...)
ggplot2::geom_boxplot
Create box and whiskers plots in ggplot2 to visualise the distribution of a continuous variable across groups. **Returns:** A Layer object.
ggplot2::geom_histogram()
Use ggplot2 geom_histogram to show the distribution of a continuous variable. Bins the data and draws bars proportional to the count in each bin.

geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...)
ggplot2::geom_line()
Use ggplot2 geom_line to connect observations in order with a line. Draws a line through data in sequence, suitable for time series.

geom_line(mapping = NULL, data = NULL, stat = "identity", position = "identity", ...)
ggplot2::geom_point()
Use ggplot2 geom_point to add a scatter plot layer. Covers position, size, colour, shape, alpha aesthetics, and position_jitter for overplotting.

geom_point(mapping = NULL, data = NULL, stat = "identity", position = "identity", ...)
ggplot2::labs
Use ggplot2 labs to set axis labels, legend title, plot title, subtitle, caption, and tag for a ggplot all in one place.
ggplot2::scale_color_manual
Define your own colour mappings in ggplot2 for discrete variables. Map factor levels to exact colours using a named or unnamed vector.
ggplot2::theme
Control non-data plot elements in ggplot2: titles, axis labels, legend, panel background, grid lines, and more.
glue()
Use the glue() function in R to format and interpolate strings with expressions inside braces. Evaluates R expressions inside braces for templating.
interval
Create an Interval object in lubridate, representing a time span between two specific datetime endpoints with calendar awareness.
keep() / discard() / compact()
Use purrr keep, discard, and compact to filter list or vector elements. Keep those matching a predicate, discard ones that don't, or remove NULL and empty.

keep(.x, .p, ...) discard(.x, .p, ...) compact(.x, ...)
lubridate::ymd
Parse dates in lubridate using ymd() for year-month-day format from character or numeric input, with automatic separator detection and truncation support.

ymd(..., quiet = FALSE, tz = NULL, locale = Sys.getlocale('LC_TIME'), truncated = 0)
now
Use lubridate now() to get the current system time as a POSIXct object. Controls timezone via the tzone argument with IANA strings like 'UTC'.
purrr::discard
Use purrr discard() to drop elements from a list or vector that match a predicate. Removes items where predicate is TRUE, the opposite of keep().
purrr::map()
Use purrr map() to apply a function to each element of a list or vector, returning a list. Typed variants like map_chr() return atomic vectors.

map(.x, .f, ...)
purrr::map2
Use purrr map2 to iterate over two vectors in parallel, applying a function pairwise. Type-specific variants return atomic vectors.
purrr::pmap
Use purrr pmap to iterate over multiple inputs simultaneously with a list of parameters. Feeds corresponding elements from each list as arguments.
purrr::possibly
Use purrr possibly to wrap any function and return a default value instead of crashing. Suppress errors or let them surface. The counterpart to safely().
purrr::reduce
Apply a binary function cumulatively to a list or vector with purrr reduce. Fold left, fold right, provide an initial value, and inspect intermediate results.
purrr::safely() / purrr::possibly() / purrr::quietly()
Use purrr safely to wrap functions, handle errors gracefully, and continue execution without failing. Returns list with result and error.

safely(.f, otherwise = NULL, quiet = TRUE) possibly(.f, otherwise = NULL, quiet = TRUE) quietly(.f)
purrr::walk()
Use purrr walk to apply a function for its side effects, returning the input invisibly. For single vectors and walk2 for parallel iteration over two vectors.

walk(.x, .f, ...)
read_csv
read_csv() reads a CSV file into a tibble with automatic type inference. Part of the readr package, supporting local files, URLs, and inline data.

read_csv(file, col_names = TRUE, col_types = NULL, na = c("", "NA"), skip = 0, n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique", trim_ws = TRUE, progress = show_progress(), show_col_types = should_show_types())
readr::write_csv()
Write a data frame to a CSV file with readr. Covers parameters, NA handling, quoting, appending, compression, and common gotchas.

write_csv(x, file, na = "NA", append = FALSE, col_names = !append, quote = "needed", escape = "double", eol = "\n", num_threads = readr_threads(), progress = show_progress())
rename() / relocate()
Rename and reorder columns in a tibble or data frame using dplyr's rename() and relocate() functions. Both always return a new tibble.

rename(.data, ...) relocate(.data, ..., .before = NULL, .after = NULL)
replace_na
Replace NA values in R vectors and data frames with tidyr::replace_na(). replaces missing values () with a specified replacement value.

replace_na(data, replace, ...)
separate()
Split a character column into multiple columns using separate() by splitting on a delimiter pattern. Part of the tidyr package in the tidyverse.
str_sub
Extract a substring from a character vector using str_sub with inclusive start/end positions, with support for negative indexing from the end of the string.
stringr::str_c()
Join multiple strings into one string with optional separators. The function from stringr combines multiple strings into one.

str_c(..., sep = "", collapse = NULL)
stringr::str_detect()
Detect the presence or absence of a pattern in a string. The function from stringr detects whether a pattern exists within a string.

str_detect(string, pattern, regex = TRUE)
stringr::str_extract()
Extract the first matching pattern from a string. The function from stringr extracts the first matching pattern from a string.

str_extract(string, pattern, regex = TRUE)
stringr::str_length()
Get the length of a string in characters. The function from stringr returns the number of characters in a string.

str_length(string)
stringr::str_pad()
Pad a string to a specified width by adding characters. The function from stringr pads a string to a specified width by adding filler characters.

str_pad(string, width, side = c("left", "right", "both"), pad = " ")
stringr::str_replace()
Replace the first occurrence or all occurrences of a pattern in a string. The function from stringr replaces the first occurrence of a pattern in a string.

str_replace(string, pattern, replacement)
stringr::str_trim()
Remove leading and trailing whitespace from strings. The function from stringr removes leading and trailing whitespace from strings.

str_trim(string, side = "both")
tibble
Create a tibble, a modern reimagining of the data frame in R, with better printing, stricter subsetting, and consistent behavior.

tibble(..., .rows = NULL, .name_repair = c('check_unique', 'unique', 'universal', 'minimal'))
tidyr::complete()
Complete missing combinations with tidyr::complete(). Expands data to include all variable combinations, turning implicit NAs into explicit ones.
tidyr::drop_na()
Drop rows containing any missing values from a data frame using tidyr::drop_na(). Use drop_na() to quickly clean data before analysis or modelling.
tidyr::fill
Fill missing values in selected columns using the previous or next value with tidyr::fill(). Supports down, up, and bidirectional filling within groups.

fill(data, ..., .by = NULL, .direction = c("down", "up", "downup", "updown"))
tidyr::nest()
Nest data frames into list-columns with tidyr. Groups rows into nested data frames stored within a single row per group.
tidyr::pivot_longer()
Pivot data from wide to long format with tidyr. Stacks multiple columns into key-value pairs for tidy data analysis.
tidyr::pivot_longer() / tidyr::pivot_wider()
Reshape data between long and wide formats using tidyr's pivot functions. is the inverse: it spreads a key-value pair across multiple columns.

pivot_longer(data, cols, names_to = "name", values_to = "value")
tidyr::pivot_wider()
Pivot data from long to wide format with tidyr. Spreads key-value pairs across multiple columns for presentation or analysis.
tidyr::unnest()
Unnest list-columns in data frames with tidyr. Expands nested data frames stored in list-columns back into regular rows and columns.
tribble
Create a tibble using a readable row-by-row layout with tribble(), the tidyverse alternative to data.frame() for small, human-readable tables.

tribble(...)
unite
Unite multiple columns into one by pasting strings together. combines multiple columns into a single new column by pasting the values together as strings.