Hyperparameter Tuning in R

· 7 min read · Updated March 27, 2026 · intermediate
r machine-learning hyperparameter-tuning caret tidymodels

Hyperparameter tuning is the process of finding the best settings for a machine learning model before training begins. Unlike model parameters, which the algorithm learns automatically from data, hyperparameters are external configurations you set manually. Getting them right often determines whether your model barely works or performs significantly better than a baseline.

Parameters vs Hyperparameters

A quick distinction keeps things clear. Parameters are what the model learns during training: regression coefficients, decision tree split points, or neural network weights. Hyperparameters control how that learning happens. How many trees should a random forest contain? How deep should a gradient boosting model grow before stopping? These are hyperparameter choices.

If you train a random forest with ntree = 50 and it learns 50 trees, those tree structures are parameters. The 50 you chose is a hyperparameter.

Examples across common algorithms:

AlgorithmHyperparameters
Random Forestmtry (variables per split), ntree (number of trees)
XGBoosteta (learning rate), max_depth, nrounds
SVMC (cost parameter), kernel type
KNNk (number of neighbors)

Grid search is the most straightforward tuning approach. You define a discrete set of values for each hyperparameter, and the method evaluates every possible combination.

With caret, you pass a tuning grid using expand.grid():

library(caret)

tune_grid <- expand.grid(.mtry = c(2, 4, 6, 8, 10))

model <- train(
  Species ~ .,
  data = iris,
  method = "rf",
  tuneGrid = tune_grid,
  trControl = trainControl(method = "cv", number = 5)
)

print(model$bestTune)

The trControl argument handles resampling. Here, 5-fold cross-validation evaluates each grid point. caret then selects the combination with the best average performance.

Grid search is exhaustive, which sounds appealing. The problem is combinatorial explosion. Tuning 4 hyperparameters with 5 values each means 5^4 = 625 combinations. Add a 5th hyperparameter and you jump to 3125. For slow models, this becomes prohibitively expensive.

Grid search works well when you have 1-2 hyperparameters and want guaranteed coverage. It’s simple to reason about and trivially parallelizable.

Random search samples hyperparameter combinations from defined ranges instead of enumerating a full grid. You specify how many combinations to try:

library(caret)

tune_length <- 30  # number of random combinations

model <- train(
  Species ~ .,
  data = iris,
  method = "rf",
  tuneLength = tune_length,
  trControl = trainControl(method = "cv", number = 5)
)

print(model$bestTune)

Research by Bergstra and Bengio (2012) showed that random search often outperforms grid search in high-dimensional spaces. The intuition: if only a few hyperparameters actually matter, random search concentrates evaluations on the dimensions that count rather than wasting them exploring every value of irrelevant parameters.

Random search is also easier to work with when hyperparameters have continuous ranges. You can sample from distributions rather than picking discrete values, giving finer resolution without combinatorial blowup.

The trade-off is reproducibility. Without a fixed seed, you won’t get the same results on reruns. If reproducibility matters for your project, set set.seed() before each run.

Bayesian Optimization

Grid search and random search treat the objective function as a black box. Bayesian optimization builds a probabilistic model of it, then uses that model to decide which configurations to try next.

The rBayesianOptimization package implements this in R. You define an objective function that trains a model and returns a score:

library(rBayesianOptimization)

obj_func <- function(mtry, min_child_weight, max_depth) {
  # Train a model and return cross-validated score
  fit <- train(
    Species ~ .,
    data = iris,
    method = "rpart",
    trControl = trainControl(method = "cv", number = 3)
  )

  list(Score = fit$results$Accuracy, Pred = 0)
}

result <- BayesianOptimization(
  obj_func,
  bounds = list(
    mtry = c(2L, 10L),
    min_child_weight = c(1L, 10L),
    max_depth = c(3L, 10L)
  ),
  init_points = 5,
  n_iter = 20
)

print(result$Best_Par)

ParBayesianOptimization uses a Tree Parzen Estimator instead of a Gaussian Process, which scales better to many hyperparameters and supports parallel evaluation:

library(ParBayesianOptimization)

obj_fn <- function(mtry, max_depth) {
  fit <- train(Species ~ ., data = iris, method = "rf", trControl = trainControl(method = "cv", number = 3))
  # NMIZE: index of the score to maximize (1 = first element of Score list)
  # nb: named list of parameter values for the optimizer to track
  list(Score = fit$results$Accuracy, NMIZE = 1, nb = c(mtry = mtry, max_depth = max_depth))
}

results <- parBayesianOptimization(
  obj_fn,
  bounds = list(mtry = c(2L, 10L), max_depth = c(4L, 12L)),
  initGrid = NULL,
  nIters = 30,
  parallelPackage = "future"
)

The key advantage of Bayesian optimization is sample efficiency. For expensive objective functions (slow models, large datasets), it typically finds better configurations in far fewer evaluations than grid or random search.

Cross-Validation for Hyperparameter Tuning

Tuning hyperparameters without proper validation leads to overfitting to your validation set. Cross-validation gives you a more reliable performance estimate.

Simple hold-out splitting is easy but unreliable:

library(caret)

train_index <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
train_data <- iris[train_index, ]
test_data <- iris[-train_index, ]

A single split can be misleading due to random variation in how data gets distributed.

K-fold cross-validation is the standard. The data splits into k folds, and the model trains on k-1 folds while validating on the remaining fold. This repeats k times, giving you k performance estimates:

library(caret)

trControl <- trainControl(method = "cv", number = 5)

model <- train(
  Species ~ .,
  data = iris,
  method = "rf",
  trControl = trControl
)

Repeated K-fold runs the process multiple times with different random splits, giving more stable estimates:

trControl <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 3
)

Early Stopping for Iterative Models

Gradient boosting models and neural networks train iteratively. Without some form of stopping, they keep fitting until they overfit the training data. Early stopping monitors validation performance and halts training when it stops improving.

XGBoost in R supports early stopping natively:

library(xgboost)

dtrain <- xgb.DMatrix(data = as.matrix(iris[, -5]), label = as.numeric(iris$Species) - 1)

params <- list(
  objective = "multi:softmax",
  num_class = 3,
  max_depth = 6,
  eta = 0.1,
  eval_metric = "mlogloss"
)

model <- xgb.train(
  params = params,
  data = dtrain,
  nrounds = 500,
  evals = list(train = dtrain, eval = dtrain),
  early_stopping_rounds = 20,
  verbose = FALSE
)

cat("Best iteration:", model$best_iteration, "\n")

Note that the example above uses the same data for both train and eval for simplicity. In practice, you should pass a separate validation set to eval so early stopping actually prevents overfitting.

You can also use early stopping through caret:

library(caret)

trControl <- trainControl(method = "cv", number = 5)

model <- train(
  Species ~ .,
  data = iris,
  method = "xgbTree",
  trControl = trControl,
  early_stopping_rounds = 20
)

A Complete Tuning Workflow with tidymodels

The tidymodels framework provides a modular alternative to caret. Here’s an end-to-end example tuning a gradient boosting model on the Hitters dataset:

library(tidymodels)
library(ISLR)

data("Hitters", package = "ISLR")
hitters <- Hitters %>%
  filter(!is.na(Salary)) %>%
  mutate(Salary = log(Salary))

set.seed(42)
split <- initial_split(hitters, prop = 0.8)
train_data <- training(split)
test_data <- testing(split)

folds <- vfold_cv(train_data, v = 5)

gbm_spec <- boost_tree(
  trees = 500,
  learn_rate = tune(),
  tree_depth = tune(),
  min_n = tune(),
  mtry = tune()
) %>%
  set_engine("gbm") %>%
  set_mode("regression")

gbm_params <- parameters(
  learn_rate(range = c(0.01, 0.3), trans = scales::log10_trans()),
  tree_depth(range = c(2, 8)),
  min_n(range = c(5, 30)),
  mtry(range = c(2, ncol(hitters) - 1))
)

set.seed(42)
gbm_grid <- grid_max_entropy(gbm_params, size = 30)

gbm_tuned <- gbm_spec %>%
  tune_grid(
    Salary ~ .,
    resamples = folds,
    grid = gbm_grid,
    metrics = metric_set(rmse, rsq, mae)
  )

show_best(gbm_tuned, metric = "rmse")

best_params <- select_best(gbm_tuned, metric = "rmse")
final_model <- gbm_spec %>%
  finalize_model(best_params) %>%
  fit(Salary ~ ., data = train_data)

test_pred <- predict(final_model, test_data)

The grid_max_entropy() function creates a space-filling design for random search, which tends to cover the parameter space more efficiently than uniform random sampling.

For Bayesian optimization with tune, use tune_bayes() instead of tune_grid(). The tidymodels approach composes cleanly: swap grid_regular(), grid_random(), or grid_max_entropy() for grid or random search, and swap tune_bayes() for Bayesian optimization.

See Also

Written

  • File: sites/rguides/src/content/tutorials/hyperparameter-tuning-r.md
  • Words: ~1130
  • Read time: 6 min
  • Topics covered: grid search, random search, Bayesian optimization, cross-validation, early stopping, caret, tidymodels
  • Verified via: tidymodels.org, rdocumentation.org, caret docs, rBayesianOptimization docs
  • Unverified items: none