rguides

The Best R Packages in 2026: A Curated List

The best R packages define how data scientists work in 2026. With thousands of packages on CRAN, knowing which ones to focus on can feel overwhelming. This guide cuts through the noise and highlights the packages that truly matter this year.

Whether you are new to R or a seasoned developer, these packages form the foundation of modern R programming.

Data manipulation

tidyverse

The tidyverse remains the undisputed king of data manipulation in R. This collection of packages sharing a common philosophy has transformed how R developers work with data.

install.packages("tidyverse")
library(tidyverse)

# The core tidyverse includes:
# - ggplot2 (visualization)
# - dplyr (data manipulation)
# - tidyr (data tidying)
# - readr (data import)
# - purrr (functional programming)
# - tibble (tibbles)
# - stringr (strings)
# - forcats (factors)

# Example workflow
mtcars %>%
  filter(cyl == 4) %>%
  mutate(kwhp = hp / 100) %>%
  arrange(desc(kwhp)) %>%
  select(mpg, hp, kwhp)

data.table

For those dealing with massive datasets, data.table remains the performance champion. Its DT[i, j, by] syntax replaces multiple dplyr verbs in a single expression, and in-place modification with := avoids the memory overhead of copy-on-modify operations. In benchmarks on datasets with millions of rows, data.table frequently outperforms dplyr by an order of magnitude, making it the go-to package when speed is the primary constraint.

install.packages("data.table")
library(data.table)

dt <- as.data.table(mtcars)
dt[cyl == 4, .(mpg, hp)][order(-mpg)]

The syntax is more compact than tidyverse and can be 10-100x faster on large data, a difference you feel once row counts exceed a few hundred thousand.

arrow and duckdb

Modern data work often spans multiple formats and exceeds what fits comfortably in memory. Arrow provides cross-language support for columnar data formats like Parquet, letting you query files on disk without loading them entirely into RAM. DuckDB brings full SQL analytical capabilities to R as an embedded, in-process database that requires no server setup, making it ideal for ad-hoc queries on CSV and Parquet files too large for dplyr.

# Arrow: read Parquet files directly without loading into memory
install.packages("arrow")
library(arrow)
dt <- read_parquet("large_file.parquet")

# DuckDB: SQL analytical queries embedded in R
install.packages("duckdb")
library(DBI)
con <- dbConnect(duckdb::duckdb())

Moving data into R is only half the job; communicating what you find is the other half. The visualization packages in R have matured considerably, and most data scientists now reach for two packages that complement each other: ggplot2 for static publication charts and plotly for interactive exploration.

Visualization

ggplot2

The grammar of graphics has become the standard for data visualization:

library(ggplot2)

ggplot(mtcars, aes(mpg, hp, color = factor(cyl))) +
  geom_point(size = 3) +
  labs(title = "Cars: HP vs MPG",
       subtitle = "By cylinder count") +
  theme_minimal()

plotly

For interactive visualizations that let readers hover, zoom, and filter, plotly wraps ggplot2 objects with a single function call. The generated charts work in web browsers, Quarto documents, and Shiny apps without additional JavaScript code. This means you can build a static chart with ggplot2 first, then call ggplotly() on the finished plot to add interactivity without rewriting anything.

install.packages("plotly")
library(plotly)

ggplotly(
  ggplot(mtcars, aes(mpg, hp, color = factor(cyl))) +
    geom_point()
)

Statistical modeling

tidymodels

The tidymodels framework provides a consistent interface for defining, fitting, and evaluating models. Unlike base R where lm(), glm(), and randomForest() each have different argument conventions, tidymodels uses a unified recipe: define a model specification, set the computational engine, and fit with a formula. The tidy() function then extracts coefficients in a standardized data frame.

install.packages("tidymodels")
library(tidymodels)

lm_spec <- linear_reg() %>%
  set_engine("lm")

lm_fit <- lm_spec %>%
  fit(mpg ~ hp + wt, data = mtcars)

tidy(lm_fit)

parsnip

Part of tidymodels, parsnip is the package that provides the unified model interface. The same workflow ( model_spec() %>% set_engine() %>% fit() ) works across linear regression, logistic regression, random forests, XGBoost, and dozens of other algorithms, with the engine argument switching between R-native implementations and external libraries.

# Same syntax for completely different model types
log_spec <- logistic_reg() %>% set_engine("glm")
rf_spec <- rand_forest() %>% set_engine("ranger")

Machine learning

caret

Despite newer alternatives, caret is still a unified training interface for over 200 machine learning algorithms. Its train() function handles cross-validation, hyperparameter tuning, and preprocessing with a single formula-based call, which is useful for quick baseline models before moving to more specialized frameworks.

install.packages("caret")
library(caret)
model <- train(Species ~ ., data = iris, method = "rf")

xgboost and lightgbm

Gradient boosting dominates machine learning competitions and production systems alike. xgboost provides the battle-tested implementation with extensive hyperparameter control, while lightgbm offers faster training on large datasets through leaf-wise tree growth. Both integrate with tidymodels via parsnip engines for a consistent modeling workflow alongside linear models and random forests.

install.packages("xgboost")
library(xgboost)

dtrain <- xgb.DMatrix(data = as.matrix(mtcars[, -1]), label = mtcars$mpg)
model <- xgb.train(params = list(objective = "reg:squarederror"), dtrain)

ranger and randomForest

For random forests, ranger provides a fast C++ implementation that handles high-dimensional data efficiently. It supports classification, regression, and survival analysis, with built-in variable importance measures and out-of-bag error estimates. The formula interface matches base R conventions, making it a drop-in replacement for the older randomForest package.

install.packages("ranger")
library(ranger)
rf <- ranger(Species ~ ., data = iris, num.trees = 100)

Reporting and documents

Quarto

Quarto has largely replaced R Markdown as the standard for literate programming in R. It supports multiple languages (R, Python, Julia) in the same document, renders to HTML, PDF, Word, and slides from a single .qmd source, and integrates with version control naturally since everything is plain text.

install.packages("quarto")
# Or use the CLI: quarto install

rmarkdown

While Quarto is the future, rmarkdown still powers thousands of existing workflows. The render() function compiles .Rmd files to finished reports, and many organizations have CI/CD pipelines built around it. Migrating existing projects to Quarto is straightforward since Quarto reads .Rmd files natively.

install.packages("rmarkdown")
library(rmarkdown)
render("report.Rmd")

gt and gtExtras

For publication-ready tables that look professional in HTML reports, journal submissions, and dashboards, gt provides a grammar of tables. Chain functions like tab_header(), tab_spanner(), and fmt_number() to build complex tables from data frames with full control over styling and formatting.

install.packages("gt")
library(gt)
mtcars %>% head() %>% gt() %>% tab_header(title = "Motor Trend Cars")

Web and aPIs

httr2

Modern HTTP requests in R use httr2, which provides a pipe-friendly API for building and performing requests. It handles authentication, retries, pagination, and streaming responses, covering everything needed to interact with REST APIs. httr2 is the successor to the older httr package with better error handling and a cleaner interface for request chaining.

install.packages("httr2")
library(httr2)

req <- request("https://api.example.com") %>%
  req_headers(Authorization = "Bearer token") %>%
  req_perform()

plumber

Plumber turns R functions into REST API endpoints by parsing special comments (#* @get, #* @post) as route definitions. This lets you serve model predictions, data summaries, or any R computation over HTTP without leaving the R ecosystem. Plumber is widely used in production to wrap machine learning models behind lightweight REST APIs.

library(plumber)

#* @get /hello
function(name = "World") {
  list(greeting = paste0("Hello, ", name))
}

rvest

Web scraping in R starts with rvest, which provides CSS selector and XPath-based extraction from HTML pages. Combined with httr2 for request handling and purrr for iteration, rvest handles most scraping tasks without requiring Selenium or headless browsers. For pages that load content dynamically with JavaScript, pair rvest with the chromote package for headless browser automation.

install.packages("rvest")
library(rvest)

page <- read_html("https://example.com")
html_elements(page, "a") %>% html_text()

Development tools

devtools

Package development in R is streamlined by devtools, which wraps common tasks like loading, checking, building, and documenting into single function calls. The load_all() function simulates installing the package without the overhead, enabling fast iteration during development. Most R package authors pair devtools with usethis for project scaffolding and roxygen2 for inline documentation.

install.packages("devtools")
library(devtools)
load_all(".")  # simulate package install
check()        # run R CMD check
build()        # create .tar.gz bundle

testthat

Automated testing catches regressions before they reach users. testthat’s test_that() / expect_*() pattern makes assertions readable: each test block describes an expectation in plain English, and failures report exactly which expectation broke and why. Combined with usethis::use_testthat(), it scaffolds a full test suite for any package.

install.packages("testthat")
library(testthat)

test_that("multiplication works", {
  expect_equal(2 * 2, 4)
})

renv

Reproducibility requires locking down package versions. renv creates project-specific libraries with renv::init(), captures the exact package versions in a lockfile with renv::snapshot(), and restores them on any machine with renv::restore(). This prevents the “it works on my machine” problem when sharing analysis code.

install.packages("renv")
library(renv)
renv::init()
renv::snapshot()

Specialization packages

lubridate

Dates and times are notoriously error-prone in base R. lubridate provides parsing functions (ymd(), mdy(), dmy()) that guess formats automatically, plus intuitive arithmetic (+ weeks(2), %--% intervals) that reads like natural language. lubridate also handles time zones and daylight saving transitions correctly, which is a common source of subtle bugs when using base R date functions alone.

install.packages("lubridate")
library(lubridate)
ymd("2026-01-15") + weeks(2)

stringr

String manipulation in R becomes predictable with stringr’s str_-prefixed verb functions. Every function takes the string as its first argument, enabling clean pipe chains, and returns consistent types: str_detect() always returns logical, str_extract() always returns character. stringr wraps the ICU C library for Unicode-aware matching, so regular expressions work correctly with non-ASCII characters.

library(stringr)
str_detect("hello world", "hello")
str_replace_all("hello", "l", "r")

forcats

Factors in R control ordering in plots and tables, but base R factor functions are inconsistent. forcats provides a tidy interface: fct_reorder() sorts factor levels by another variable, fct_lump() collapses rare levels, and fct_infreq() orders by frequency. All return factors with sensibly ordered levels.

library(forcats)
fct_reorder(f, n) %>% fct_lump_n(10)

The rising stars

Several packages are gaining momentum in 2026:

polars

The Polars library, already popular in the Python data science community, now has a mature R interface. Like data.table, polars uses lazy evaluation: computations are described, not executed immediately, letting the query optimizer reorder and fuse operations for maximum speed. This approach can outperform both dplyr and data.table on multi-gigabyte datasets, particularly when filtering before grouping. While still younger than the tidyverse ecosystem, polars is a strong candidate for data engineering pipelines where raw throughput matters.

install.packages("polars")
library(polars)

targets

The targets package addresses a problem every data scientist has faced: rerunning a long pipeline from scratch after changing one preprocessing step. targets tracks which parts of the pipeline depend on which inputs, detects what changed, and re-runs only the affected downstream steps. This saves hours on large projects and makes the final outputs provably reproducible, since the package stores a hash of every intermediate result and will not return stale output. For team projects or analyses that must survive regulatory review, targets replaces ad-hoc scripts with a formal, auditable pipeline.

install.packages("targets")
library(targets)

vctrs

The vctrs package provides a consistent foundation for vector operations in R, handling type coercion, class inheritance, and missing values for custom S3 vector classes. Package developers building new vector types — think custom date classes, specialized factors, or geospatial types — use vctrs to define well-behaved vectors that work correctly with c(), [, and the rest of base R. As more tidyverse packages adopt vctrs internally, understanding its conventions is becoming essential for R package development.

Why package selection matters in 2026

The R package ecosystem has over 20,000 packages on CRAN, with thousands more on Bioconductor and GitHub. This abundance is a strength and a challenge: for most tasks, there are multiple options, and choosing well saves hours of debugging and refactoring. The packages listed here have been selected based on stability, maintenance activity, community adoption, and relevance to practical data science work in 2026.

A few criteria guided this list. First, active maintenance: packages where the last commit is recent and issues are addressed. Second, CRAN availability: packages installable with install.packages() without manual steps. Third, real-world adoption: measured by CRAN download counts and usage in production codebases. Niche or experimental packages were excluded even when technically impressive.

Starting from the tidyverse

The tidyverse is the right starting point for most data science work in R. The core packages, dplyr, tidyr, ggplot2, readr, purrr, tibble, stringr, forcats, and lubridate, are maintained by the same team, follow consistent design principles, and are designed to work together. Installing the tidyverse meta-package with install.packages("tidyverse") gives you all of them with one command.

Beyond the core, broom (tidy model output), glue (string interpolation), janitor (data cleaning utilities), and skimr (data summaries) are common additions that most data scientists install early in a project.

The tidyverse design philosophy centers on tidy data: each variable is a column, each observation is a row, and each type of observational unit is a table. Adopting this convention means that every function in the ecosystem expects data in the same shape, which eliminates the data reshaping steps that consume so much time in base R workflows. Once your data is tidy, you can chain dplyr verbs, pipe into ggplot2, or feed it to tidymodels without worrying about format compatibility.

Performance at scale

For data larger than fits comfortably in memory, or for operations where dplyr is too slow, the performance-oriented packages provide dramatic speedups. data.table achieves speed through in-place modification and a highly optimized C backend. arrow enables analysis of data stored in Parquet format without loading it all into memory. duckdb brings SQL analytics to R with performance competitive with much larger systems. These three packages cover the majority of “data is too big for base R” scenarios without requiring a distributed computing setup.

The ML and modeling ecosystem

tidymodels has established itself as the standard modeling framework in R, providing consistent interfaces to hundreds of models through parsnip, cross-validation through rsample, feature engineering through recipes, and hyperparameter tuning through tune. For deep learning, torch (the R interface to libtorch) and keras3 (the R interface to Keras 3) are the primary options. For Bayesian modeling, Stan (via rstan or cmdstanr) and the brms formula-based interface remain the gold standard.

Data import and export ecosystem

The import ecosystem in R is comprehensive. readr handles delimited text files (CSV, TSV) with consistent behavior. readxl reads Excel files. haven reads SAS, Stata, and SPSS files, preserving variable labels. jsonlite handles JSON. arrow reads and writes Parquet and Arrow IPC files. DBI with database-specific backends (RPostgres, RSQLite, odbc) connects to SQL databases. httr2 fetches data from REST APIs.

For output, the same packages write the corresponding formats. writexl writes Excel without Java dependencies. openxlsx provides styled Excel output with multiple sheets. gt renders publication-quality HTML and LaTeX tables. Quarto produces HTML, PDF, Word, and presentation formats from .qmd documents. This breadth means R can sit at the center of most data pipelines, reading from any source and writing to any output format.

Ecosystem stability and versioning

R’s package ecosystem is notably stable. CRAN’s submission policies require that new versions do not break reverse dependencies, which means popular packages maintain backward compatibility across years. Code written against tidyverse 1.3 in 2021 still largely works with tidyverse 2.x in 2026. This stability is a significant advantage for long-lived research code, clinical trial analysis, and regulated environments where auditing requires proving that the analysis software has not changed.

How to stay current

CRAN and Bioconductor publish new versions daily. The most reliable signal for package quality is CRAN task views (for domain-specific collections), the r-universe registry (for fast-moving development packages), and posit’s weekly digest. For any unfamiliar package, check the NEWS file for recent maintenance activity and the GitHub issues for open bugs. A package with active maintainers and recent releases is safer to depend on than one with no commits in two years, regardless of download count.

Packages worth knowing

Beyond the tidyverse: data.table for high-performance data manipulation. arrow for columnar data and Parquet files. duckdb for in-process analytical SQL. polars for lazy, out-of-core data processing. tidymodels for machine learning workflows. brms for Bayesian modeling. shiny and golem for web applications. targets for reproducible pipelines. renv for package management. quarto for literate programming. These packages address different parts of the data science workflow and are well-maintained with active communities.

Evaluating package quality

With over 20,000 CRAN packages, choosing among alternatives requires criteria beyond just functionality. A package that solves your problem but is unmaintained, poorly documented, or incompatible with your other packages creates more work than it saves. Evaluating packages before adopting them is worth the time, especially for packages that will become core infrastructure in a project.

CRAN download statistics, GitHub stars, and Stack Overflow activity indicate community adoption. A widely used package is more likely to have answered questions online, more likely to be actively maintained, and more likely to integrate with other commonly used packages. These are proxies for quality, not guarantees, but they filter out the long tail of abandoned or obscure packages.

The tidyverse and its extensions

The tidyverse is a curated collection of packages with consistent design principles. Packages that extend the tidyverse, following its conventions for function naming, argument order, and tidy data compatibility, integrate naturally into tidyverse workflows. Checking whether a package supports piping with the first argument as data, returns tidy data frames, and uses tidyselect for column selection identifies packages designed for tidyverse-style code.

The tidymodels ecosystem extends the tidyverse to machine learning with the same design principles. Adding a model type to a tidymodels workflow requires only specifying the parsnip engine; everything else, data splitting, cross-validation, tuning, performance evaluation, stays the same. A package with a parsnip engine is immediately accessible in the full tidymodels framework.

Stability and maintenance

CRAN requires that packages pass checks against the current R version. A package on CRAN is at minimum not broken by the current R version. But CRAN does not require active development, a package can be on CRAN for years without updates. Checking the last commit date on GitHub and the frequency of CRAN releases indicates whether a package is actively maintained.

GitHub-only packages (those not on CRAN) have no quality gate. Some are excellent pre-CRAN packages from active developers. Others are abandoned experiments. Evaluating a GitHub-only package requires looking at open issues, recent commits, and the developer’s responsiveness to bug reports before depending on it in a project.

See also