R Machine Learning Packages in 2026: The Complete Landscape
The R machine learning ecosystem has matured significantly. In 2026, data scientists have access to a rich set of packages covering everything from traditional statistics to cutting-edge deep learning. This guide surveys the most important machine learning packages in R and when to use each one.
The Modern R ML Stack
The R machine learning landscape has consolidated around several key frameworks:
- tidymodels — The modern, tidyverse-aligned framework for modeling
- caret — Still widely used for classical methods
- h2o — Enterprise-grade AutoML
- torch — Deep learning (R interface to PyTorch)
- xgboost and lightgbm — Gradient boosting implementations
- rstan and brms — Bayesian machine learning
Tidymodels: The New Standard
The tidymodels framework has become the go-to choice for most machine learning tasks in R. It provides a unified interface that feels like the tidyverse:
library(tidymodels)
# Define a model
lm_model <- linear_reg() %>%
set_engine("lm")
# Fit the model
lm_fit <- lm_model %>%
fit(mpg ~ wt + hp, data = mtcars)
The key packages in the tidymodels ecosystem include:
- parsnip — Unified interface for model definitions
- recipes — Data preprocessing
- workflows — Pipelining
- tune — Hyperparameter optimization
- yardstick — Model evaluation
Classification Example
library(tidymodels)
library(tidyverse)
# Load data
data("iris")
# Split data
iris_split <- iris %>%
initial_split(prop = 0.8, strata = Species)
iris_train <- training(iris_split)
iris_test <- testing(iris_split)
# Define model
rf_spec <- rand_forest(trees = 100) %>%
set_mode("classification")
# Fit
rf_fit <- rf_spec %>%
fit(Species ~ ., data = iris_train)
# Predict
predictions <- rf_fit %>%
predict(iris_test) %>%
bind_cols(iris_test)
# Evaluate
predictions %>%
accuracy(truth = Species, estimate = .pred_class)
Gradient Boosting: xgboost and lightgbm
For tabular data, gradient boosting methods consistently deliver top performance. The xgboost and lightgbm packages provide fast implementations:
library(xgboost)
# Prepare data
dtrain <- xgb.DMatrix(data = as.matrix(train[, -1]), label = train$y)
dtest <- xgb.DMatrix(data = as.matrix(test[, -1]), label = test$y)
# Train
params <- list(
objective = "binary:logistic",
eval_metric = "auc",
max_depth = 6,
learning_rate = 0.1
)
model <- xgb.train(
params = params,
data = dtrain,
nrounds = 100,
watchlist = list(train = dtrain, test = dtest)
)
lightgbm offers similar functionality with often faster training times:
library(lightgbm)
train_data <- lgb.Dataset(
data = as.matrix(train_features),
label = train_labels
)
params <- list(
objective = "binary",
metric = "auc",
learning_rate = 0.1,
num_leaves = 31
)
model <- lgb.train(
params = params,
data = train_data,
num_iterations = 100
)
Deep Learning with torch
For neural networks, the torch package provides native PyTorch functionality in R:
library(torch)
# Define a simple neural network
net <- nn_module(
initialize = function(...) {
self$fc1 <- nn_linear(784, 256)
self$fc2 <- nn_linear(256, 10)
self$dropout <- nn_dropout(0.2)
},
forward = function(x) {
x %>%
torch_flatten(start_dim = 2) %>%
self$fc1() %>%
nn_relu() %>%
self$dropout() %>%
self$fc2()
}
)
# Train
model <- net()
optimizer <- optim_adam(model$parameters, lr = 0.01)
for (epoch in 1:10) {
model$train()
optimizer$zero_grad()
batch <- dataset_mnist_batch(batch_size = 32)
output <- model(batch$data)
loss <- nnf_cross_entropy(output, batch$target)
loss$backward()
optimizer$step()
}
H2O AutoML
For quick prototyping and automated model selection, H2O provides powerful AutoML capabilities:
library(h2o)
h2o.init()
# Prepare data
train_h2o <- as.h2o(train_data)
test_h2o <- as.h2o(test_data)
# Run AutoML
aml <- h2o.automl(
x = features,
y = target,
training_frame = train_h2o,
max_runtime_secs = 300,
leaderboard_frame = test_h2o
)
# Get leaderboard
aml@leaderboard
Bayesian Methods: rstan and brms
For Bayesian inference, rstan provides the R interface to Stan:
library(rstan)
library(brms)
# Fit a Bayesian regression
fit <- brm(
mpg ~ wt + hp + cyl,
data = mtcars,
family = gaussian(),
prior = prior(normal(0, 10), class = b),
iter = 2000,
chains = 4
)
summary(fit)
Choosing the Right Package
Here’s a quick decision guide:
- General ML — tidymodels
- Tabular data, maximum accuracy — xgboost, lightgbm
- Deep learning — torch
- AutoML quick prototype — h2o
- Bayesian inference — brms
- Time series — forecast, fable
- Model interpretability — DALEX, iml
Performance Comparison
For the classic Titanic classification problem, here’s a rough performance guide:
- Logistic Regression — 78-80%
- Random Forest — 80-83%
- XGBoost — 83-86%
- LightGBM — 83-86%
- H2O AutoML — 84-87%
- Neural Network (torch) — 80-85%
Your results will vary based on feature engineering and cross-validation strategy.
Installation Tips
Most ML packages are available from CRAN:
install.packages(c(
"tidymodels",
"xgboost",
"lightgbm",
"h2o",
"brms",
"torch"
))
For torch on macOS with Apple Silicon:
torch::install_torch()
Conclusion
The R machine learning ecosystem in 2026 offers powerful tools for every use case. Start with tidymodels for most tasks, reach for gradient boosting (xgboost, lightgbm) when you need maximum performance on tabular data, and use torch for deep learning. H2O provides excellent AutoML for quick experiments, while brms makes Bayesian methods accessible.
The tidyverse-aligned packages have made machine learning in R more approachable than ever, with consistent syntax and excellent documentation.
See Also
- R for Machine Learning in 2026 — A practical guide to the ML workflow in R
- Best R Packages 2026 — The most essential R packages for data science