Join two data frames by matching rows based on key columns. Learn how to use dplyr join functions to combine datasets in R.
Reference
Tidyverse
Common dplyr, purrr, and tibble workflows.
- dplyr::*_join()
- dplyr::across()
Apply functions across multiple columns in dplyr: across() modifies columns in place while where() selects columns by type or condition.
- dplyr::arrange
Sort rows of a data frame by column values with dplyr::arrange(), dplyr's row-ordering verb.
- dplyr::arrange()
Sort rows of a data frame by column values in ascending or descending order using dplyr.
- dplyr::bind_cols
Combine data frames by adding columns side by side. Matches rows by position, not by key.
- dplyr::bind_rows() / dplyr::bind_cols()
Combine data frames by stacking rows or joining columns horizontally.
- dplyr::case_when
Create a new column using vectorized conditional logic with case_when(), handling multiple conditions in order — the dplyr equivalent of SQL's CASE WHEN.
- dplyr::count()
Count the number of observations in each group. These dplyr verbs provide a convenient way to summarise data by grouping variables.
- dplyr::distinct
Remove duplicate rows from an R data frame, keeping only unique combinations of specified columns.
- dplyr::filter()
Subset rows of a data frame based on logical conditions using expressive dplyr syntax.
- dplyr::filter()
Subset rows of a data frame or tibble using logical conditions with dplyr's filter() function.
- dplyr::group_by
Group a data frame by one or more columns for per-group operations with summarise() and mutate().
- dplyr::group_by()
Group data by one or more columns and compute summary statistics using summarise().
- dplyr::if_else
Type-strict vectorized if-else for R. Stricter than base ifelse(), handles NAs explicitly, preserves types. Used inside mutate() for conditional columns.
- dplyr::mutate
Add or modify columns in a tibble or data frame with mutate, dplyr's column-wise transformation function.
- dplyr::mutate()
Create new columns or modify existing ones in a tibble or data frame using vectorised operations.
- dplyr::pull
Extract a single column from a data frame as a vector. Defaults to the last column, supports negative indexing from the right, and can produce named vectors.
- dplyr::relocate
Move data frame columns to new positions using tidy-select syntax with relocate(), including .before, .after, and renaming during the move.
- dplyr::rename
Rename columns in a data frame using new_name = old_name syntax. Also covers rename_with() for batch renaming with functions.
- dplyr::rename() / dplyr::relocate()
Rename and reorder columns in a tibble or data frame using syntactic naming.
- dplyr::select
Select specific columns from a data frame using flexible tidyselect helpers and syntax.
- dplyr::select()
Select columns from a data frame by name, position, or pattern.
- dplyr::slice()
Select rows by position, head, tail, random sampling, or rank using dplyr slice functions.
- dplyr::summarise()
Collapse a tibble to one row per group using summary functions. Use .by (dplyr 1.1+) or group_by().
- fct_lump
Collapse uncommon factor levels into an Other category. Covers fct_lump_n, fct_lump_prop, fct_lump_min, and fct_lump_lowfreq.
- fct_reorder
Reorder factor levels by a summary statistic of a second variable.
- ggplot2::aes()
Map variables to visual aesthetics in ggplot2. Covers aes(), aes_string(), aes_quosures(), and how column names become plot labels.
- ggplot2::coord_flip
Flip horizontal and vertical axes in ggplot2. Swaps x and y so horizontal bar charts, boxplots, and histograms display cleanly without re-coding aesthetics.
- ggplot2::facet_wrap
Wrap a 1D sequence of panels into a 2D grid with ggplot2. Control nrow, ncol, scales, strip position, and which axes are displayed.
- ggplot2::geom_bar()
Draw bars with height proportional to count or value. geom_bar uses stat_count by default, mapping x to categories and y to frequencies.
- ggplot2::geom_boxplot
Create box and whiskers plots to visualise the distribution of a continuous variable across groups.
- ggplot2::geom_histogram()
Draw histograms to show the distribution of a continuous variable. geom_histogram bins the data and draws bars proportional to the count in each bin.
- ggplot2::geom_line()
Connect observations in order with a line. geom_line draws a line through the data in the sequence it appears, suitable for time series and sequential data.
- ggplot2::geom_point()
Add a scatter plot layer with geom_point(). Covers position, size, colour, shape, alpha aesthetics, and position_jitter for overplotting.
- ggplot2::labs
Set axis labels, legend title, plot title, subtitle, caption, and tag for a ggplot. All in one place.
- ggplot2::scale_color_manual
Define your own colour mappings for discrete variables. Map factor levels to exact colours using a named or unnamed vector.
- ggplot2::theme
Control non-data plot elements in ggplot2: titles, axis labels, legend, panel background, grid lines, and more.
- glue()
Format and interpolate strings in R with expressions inside braces.
- interval
Create an Interval object in lubridate, representing a time span between two specific datetime endpoints with calendar awareness.
- lubridate::ymd
Parse dates in year-month-day format from character or numeric input with automatic separator detection and flexible truncation support.
- now
Get the current system time in R as a POSIXct object. Control the timezone with the tzone argument using IANA timezone strings like 'UTC' or 'America/New_York'.
- purrr::discard
Drop elements from a list or vector that don't match a predicate. discard() removes items where predicate is TRUE — opposite of keep(). Works with pipes.
- purrr::keep
Keep elements of a list or vector that satisfy a predicate. Discard drops the rest. Compact removes empty elements. All three work with the pipe.
- purrr::keep() / purrr::discard() / purrr::compact()
Filter elements of a list or vector by keeping those matching a predicate, discarding those that don't, or removing NULL and empty elements.
- purrr::map
Apply a function to each element of a vector or list. map returns a list; type-specific variants return atomic vectors directly.
- purrr::map()
Apply a function to each element of a list or vector, returning a list, vector, or other type.
- purrr::map2
Iterate over two vectors in parallel, applying a function pairwise. Type-specific variants return atomic vectors of the corresponding type.
- purrr::pmap
Iterate over multiple inputs simultaneously using a list of parameters. pmap feeds corresponding elements from each list as arguments to your function.
- purrr::possibly
Wrap any function with possibly() to return a default value instead of crashing. quietly suppress errors or let them surface. The counterpart to safely().
- purrr::reduce
Apply a binary function cumulatively to a list or vector with purrr reduce. Fold left, fold right, provide an initial value, and inspect intermediate results.
- purrr::reduce
Apply a binary function cumulatively to a list or vector with purrr reduce. Fold left, fold right, provide an initial value, and inspect intermediate results.
- purrr::safely
Wrap any function with safely() to return a list with result and error components. Inspect errors without crashing. The counterpart to possibly().
- purrr::safely() / purrr::possibly() / purrr::quietly()
Wrap functions to handle errors gracefully, capture them, and continue execution without failing.
- purrr::walk
Use purrr walk to perform side effects — save files, print output, plot — while keeping the pipe flowing. Returns input invisibly.
- purrr::walk()
Apply a function for its side effects, returning the input invisibly. Use walk() for single vectors and walk2() for parallel iteration over two vectors.
- read_csv
Read a CSV file into a tibble with automatic type inference. Part of the readr package in the tidyverse.
- readr::write_csv()
Write a data frame to a CSV file with readr. Covers parameters, NA handling, quoting, appending, compression, and common gotchas.
- replace_na
Replace NA values in R vectors and data frames with tidyr::replace_na().
- separate()
Split a character column into multiple columns by splitting on a delimiter pattern. Part of the tidyr package in the tidyverse.
- str_sub
Extract a substring from a character vector using inclusive start/end positions, with support for negative indexing from the end of the string.
- stringr::str_c()
Join multiple strings into one string with optional separators.
- stringr::str_detect()
Detect the presence or absence of a pattern in a string.
- stringr::str_extract()
Extract the first matching pattern from a string.
- stringr::str_length()
Get the length of a string in characters.
- stringr::str_pad()
Pad a string to a specified width by adding characters.
- stringr::str_replace()
Replace the first occurrence or all occurrences of a pattern in a string.
- stringr::str_trim()
Remove leading and trailing whitespace from strings.
- tibble
Create a tibble, a modern reimagining of the data frame in R, with better printing, stricter subsetting, and consistent behavior.
- tidyr::complete()
Fill in missing combinations of values in a data frame. Use complete() to expose implicit gaps in your data and turn them into explicit rows.
- tidyr::drop_na()
Drop rows containing any missing values from a data frame. Use drop_na() to quickly clean data before analysis or modelling.
- tidyr::fill
Fill missing values in selected columns using the previous or next value. Supports down, up, and bidirectional filling within groups.
- tidyr::nest()
Nest columns into a list-column of data frames. Use nest() to create nested tidy data for per-group operations.
- tidyr::pivot_longer()
Lengthen data by pivoting columns into rows. Transform wide data into tidy format where each row is a single observation.
- tidyr::pivot_longer() / tidyr::pivot_wider()
Reshape data between long and wide formats using tidyr's pivot functions.
- tidyr::pivot_wider()
Widen data by spreading key-value pairs across columns. The inverse of pivot_longer().
- tidyr::unnest()
Expand list-columns back into rows and regular columns. Use unnest() to flatten nested tidy data for analysis.
- tribble
Create a tibble using a readable row-by-row layout with tribble(), the tidyverse alternative to data.frame() for small, human-readable tables.
- unite
Unite multiple columns into one by pasting strings together.