rguides

Keras R: Build Deep Learning Models with TensorFlow in R

The Keras R package provides a user-friendly interface to TensorFlow for building and training deep learning models directly from R. You can construct neural networks using both Sequential and Functional APIs without writing Python code, and the R pipe operator (|>) makes layer definitions feel natural in an R workflow. This guide covers installation, model construction, training workflows, evaluation, and saving deployed models.

Installation

Install the keras package from CRAN. This will automatically install TensorFlow as a dependency:

install.packages("keras")
library(keras)

# Install TensorFlow (creates a Python environment)
install_tensorflow()

The standard installation downloads the CPU-only version of TensorFlow, which works everywhere but trains slowly on large models. For GPU acceleration, install the GPU-enabled version instead. You will need an NVIDIA GPU with CUDA drivers installed on your system, and the GPU build compiles kernels that can be orders of magnitude faster for matrix-heavy operations:

install_tensorflow(version = "gpu")

The installation process downloads and configures a Python environment with TensorFlow and its dependencies. This may take several minutes on first run as it fetches the Python runtime, TensorFlow wheels, and supporting libraries. Once the environment is ready, you can verify the installation by creating a TensorFlow constant and inspecting it:

library(keras)
tensorflow::tf$constant("Hello, Keras!")  # Returns: tf.Tensor(b'Hello, Keras!', shape=(), dtype=string)

Building models

Keras in R supports two primary APIs for model construction: the Sequential API for simple, linear stacks of layers, and the Functional API for complex architectures with multiple inputs or outputs.

Sequential API

The Sequential API provides a straightforward way to build models as a linear stack of layers. This is ideal for most use cases:

library(keras)

model <- keras_model_sequential() |>
  layer_dense(units = 256, activation = "relu", input_shape = c(784)) |>
  layer_dropout(0.2) |>
  layer_dense(units = 128, activation = "relu") |>
  layer_dropout(0.2) |>
  layer_dense(units = 10, activation = "softmax")

model |> summary()
# Model: "sequential"
# ________________________________________________________________
# Layer (type)                        Output Shape              Param #   
# =================================================================
# dense (Dense)                       (None, 256)               200960   
# dropout (Dropout)                  (None, 256)               0         
# dense_1 (Dense)                    (None, 128)               32896     
# dropout_1 (Dropout)                (None, 128)               0         
# dense_2 (Dense)                    (None, 10)                1290      
# =================================================================
# Total params: 235,146
# Trainable params: 235,146
# Non-trainable params: 0

This creates a feedforward network with 784 inputs (flattened MNIST images), two hidden layers with ReLU activation and dropout for regularization, and 10 outputs for digit classification. The Sequential API is the right choice when each layer feeds exactly one downstream layer, which covers most standard architectures including feedforward networks, simple CNNs, and stacked RNNs.

Functional API

The Functional API provides more flexibility for complex architectures. Instead of stacking layers linearly, you define input tensors and chain layer calls explicitly, which lets you build models with shared layers, multiple inputs, or multiple outputs. This API also gives you fine-grained control over the tensor shapes flowing through each branch of your model:

# Define inputs
inputs <- layer_input(shape = c(784))
outputs <- inputs |>
  layer_dense(256, activation = "relu") |>
  layer_dropout(0.2) |>
  layer_dense(128, activation = "relu") |>
  layer_dropout(0.2) |>
  layer_dense(10, activation = "softmax")

# Create model
model <- keras_model(inputs = inputs, outputs = outputs)

A more advanced example: a model with multiple inputs (considering both image data and metadata). The Functional API shines when your model needs to process different input types through separate branches and then merge them. For instance, you might combine image features extracted by a convolutional branch with tabular metadata processed through dense layers. The layer_concatenate() function joins the branches into a single tensor that feeds the final classification head:

# Image input branch
image_input <- layer_input(shape = c(224, 224, 3))
image_features <- image_input |>
  layer_conv_2d(32, c(3, 3), activation = "relu") |>
  layer_max_pooling_2d(c(2, 2)) |>
  layer_flatten()

# Metadata input branch
metadata_input <- layer_input(shape = c(10))
metadata_features <- metadata_input |>
  layer_dense(32, activation = "relu")

# Concatenate and output
combined <- layer_concatenate(list(image_features, metadata_features)) |>
  layer_dense(64, activation = "relu") |>
  layer_dense(1, activation = "sigmoid")

model <- keras_model(
  inputs = list(image_input, metadata_input),
  outputs = combined
)

Common layer types

Keras provides a wide range of layer types beyond the dense and convolutional layers shown above. Choosing the right layer for your data type is essential: convolutional layers for spatial patterns in images, recurrent layers for sequential dependencies in time series, and embedding layers for high-cardinality categorical variables. The table below lists the most commonly used layers and what each one does:

LayerDescription
layer_dense(units, activation)Fully connected layer
layer_conv_2d(filters, kernel_size, activation)2D convolution for images
layer_conv_1d(filters, kernel_size, activation)1D convolution for sequences
layer_lstm(units)Long Short-Term Memory layer
layer_gru(units)Gated Recurrent Unit layer
layer_embedding(input_dim, output_dim)Word embedding lookup
layer_dropout(rate)Dropout regularization
layer_batch_normalization()Batch normalization
layer_max_pooling_2d(pool_size)Max pooling for downsampling
layer_average_pooling_2d(pool_size)Average pooling
layer_flatten()Flatten for dense layers
layer_dense(units, activation)Output layer with softmax/sigmoid

Compiling models

Before training, compile the model by specifying the optimizer, loss function, and metrics to track. Compiling finalizes the model’s training configuration: the optimizer controls how weights are updated, the loss function measures prediction error, and the metrics list determines what gets reported during training. You can always recompile with different settings before calling fit():

model |> compile(
  optimizer = optimizer_adam(learning_rate = 0.001),
  loss = "categorical_crossentropy",
  metrics = c("accuracy", "mae")
)

Optimizers

Keras provides several optimizers that implement different strategies for updating model weights during training. Adam is typically a good starting point because it combines momentum and adaptive learning rates, converging quickly without much tuning. AdamW adds weight decay regularization, which often improves generalization. SGD with momentum is simpler but can outperform adaptive methods when carefully tuned with a learning rate schedule. Here is how to specify each one:

# Adam (adaptive learning rate)
optimizer_adam(learning_rate = 0.001)

# AdamW (Adam with weight decay)
optimizer_adamw(learning_rate = 0.001, weight_decay = 0.01)

# Stochastic Gradient Descent with momentum
optimizer_sgd(learning_rate = 0.01, momentum = 0.9)

# RMSprop
optimizer_rmsprop(learning_rate = 0.001)

Loss functions

Choose the appropriate loss based on your task. The loss function quantifies how far your model’s predictions are from the true values, and the optimizer’s job is to minimise this quantity. Classification tasks use cross-entropy losses that measure the divergence between predicted probabilities and true labels. Regression tasks use distance-based losses that penalise absolute or squared errors. Here is a quick reference for matching loss functions to problem types:

# Classification
loss = "categorical_crossentropy"      # Multi-class
loss = "binary_crossentropy"           # Binary classification

# Regression
loss = "mse"                           # Mean Squared Error
loss = "mae"                           # Mean Absolute Error
loss = "huber"                         # Reliable to outliers

Training models

Train models using the fit() function. This handles the entire training loop: it shuffles data, feeds batches to the model, computes gradients, updates weights, and records loss and metrics for every epoch. The function returns a history object containing per-epoch values, which you can plot to diagnose overfitting or underfitting. Preprocessing your data into the right shape is critical before calling fit():

# Prepare data (example with MNIST-like data)
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))

# One-hot encode labels
y_train <- to_categorical(y_train, num_classes = 10)
y_test <- to_categorical(y_test, num_classes = 10)

# Train the model
history <- model |> fit(
  x_train, y_train,
  epochs = 30,
  batch_size = 128,
  validation_split = 0.2,
  callbacks = list(
    callback_early_stopping(patience = 5, restore_best_weights = TRUE),
    callback_reduce_lr_on_plateau(patience = 3, factor = 0.5)
  )
)

# Output example:
# Epoch 1/30
# 563/563 [==============================] - 5s - loss: 0.45 - accuracy: 0.85 - val_loss: 0.32 - val_accuracy: 0.89
# ...
# Epoch 30/30
# 563/563 [==============================] - 4s - loss: 0.12 - accuracy: 0.96 - val_loss: 0.25 - val_accuracy: 0.93

Training callbacks

Callbacks provide hooks to customize the training process at key moments: the start and end of each epoch, or the start and end of each batch. They let you implement early stopping, learning rate scheduling, model checkpointing, and custom stopping conditions without modifying the training loop itself. The callbacks below cover the most common training-automation patterns:

callbacks <- list(
  # Stop early if validation loss stops improving
  callback_early_stopping(
    monitor = "val_loss",
    patience = 5,
    restore_best_weights = TRUE
  ),
  
  # Reduce learning rate when training stagnates
  callback_reduce_lr_on_plateau(
    monitor = "val_loss",
    factor = 0.5,
    patience = 3,
    min_lr = 1e-6
  ),
  
  # Save model checkpoints
  callback_model_checkpoint(
    "best_model.keras",
    monitor = "val_accuracy",
    save_best_only = TRUE
  ),
  
  # Stop training at target accuracy
  callback_lambda(
    on_epoch_end = function(epoch, logs) {
      if (logs$val_accuracy > 0.98) {
        cat("\nReached 98% validation accuracy. Stopping...")
        callback_lambda_stop_training()
      }
    }
  )
)

Custom training loop

For maximum control, you can implement a custom training loop using tf$GradientTape. This replaces the high-level fit() function with explicit gradient computation and weight updates, which is necessary when your training logic cannot be expressed through standard callbacks alone. Use a custom loop when you need per-batch logging, gradient clipping, or non-standard update rules that the built-in optimizers do not support:

# Custom training loop
optimizer <- optimizer_adam(0.001)
loss_fn <- loss_categorical_crossentropy()

train_step <- function(x, y) {
  with(tf$GradientTape() %as% tape, {
    predictions <- model(x, training = TRUE)
    loss <- loss_fn(y, predictions)
  })
  
  gradients <- tape$gradient(loss, model$trainable_variables)
  optimizer$apply_gradients(zip_lists(gradients, model$trainable_variables))
  
  loss
}

# Run custom training loop
for (epoch in 1:epochs) {
  cat("Epoch", epoch, "\n")
  
  # Train over batches
  for (i in seq(1, nrow(x_train), batch_size)) {
    batch_end <- min(i + batch_size - 1, nrow(x_train))
    x_batch <- x_train[i:batch_end,]
    y_batch <- y_train[i:batch_end,]
    
    loss <- train_step(x_batch, y_batch)
  }
  
  cat("Loss:", as.numeric(loss), "\n")
}

Evaluation and prediction

After training, evaluate model performance on test data and generate predictions. The evaluate() function runs a forward pass over the test set and computes loss plus all configured metrics, giving you an unbiased estimate of real-world performance. The predict() function returns raw probability outputs, and you convert these to class labels with which.max() on each row. Always evaluate on data the model never saw during training:

# Evaluate on test set
results <- model |> evaluate(x_test, y_test, verbose = 0)
cat("Test Loss:", results[1], "\n")
# Test Loss: 0.23
cat("Test Accuracy:", results[2], "\n")
# Test Accuracy: 0.92

# Generate predictions
predictions <- model |> predict(x_test)
head(predictions)
#            [,1]       [,2]       [,3]       [,4]       [,5]      [,6] ...
# [1,] 0.0012345  0.0009876  0.1234567  0.8523456  0.0123456  0.0098765 ...
# [2,] 0.8765432  0.0234567  0.0123456  0.0345678  0.0323456  0.0212345 ...

# Get predicted classes
predicted_classes <- apply(predictions, 1, which.max) - 1

Model metrics

Track multiple metrics during training and evaluation. Beyond accuracy, Keras supports precision, recall, AUC, and custom metric functions. Specifying multiple metrics helps you diagnose specific failure modes: a model might have high accuracy but low recall if it is ignoring a minority class, or high AUC but poor precision if it produces too many false positives. Add metrics to the compile call, and they appear in both evaluate() output and training logs:

model |> compile(
  optimizer = optimizer_adam(0.001),
  loss = "categorical_crossentropy",
  metrics = c(
    "accuracy",
    metric_auc(name = "auc"),
    metric_precision(name = "precision"),
    metric_recall(name = "recall")
  )
)

# Results include all specified metrics
results <- model |> evaluate(x_test, y_test)
# $loss
# [1] 0.23
# $accuracy
# [1] 0.92
# $auc
# [1] 0.98
# $precision
# [1] 0.91
# $recall
# [1] 0.90

Saving and loading models

Keras provides multiple formats for saving models, each with different trade-offs. The SavedModel format preserves the full model including architecture, weights, and optimizer state, making it suitable for resuming training or serving. The HDF5 format is a single file that is easier to transfer. Saving only the weights produces the smallest file, but you must recreate the identical model architecture before loading them. Here is how to use each format:

# Save entire model (architecture + weights + optimizer state)
model |> save_model_tf("model_tf/")        # SavedModel format
model |> save_model_hdf5("model.keras")    # Keras format

# Save only weights (more portable)
model |> save_weights("model_weights.h5")

# Load model
loaded_model <- load_model_hdf5("model.keras")

# Load weights into model with matching architecture
model |> load_weights("model_weights.h5")

Export to tensorFlow lite

For deployment on mobile or edge devices, convert to TensorFlow Lite format. TensorFlow Lite is a lightweight runtime optimized for ARM processors and low-power devices, and it applies optimisations like operator fusion and quantization to reduce model size and latency. The converter reads a SavedModel directory and produces a .tflite file that can be loaded from Android, iOS, or embedded Linux:

converter <- tf$lite$TFLiteConverterFromSavedModel("model_tf/")
converter$optimizations <- list(tf$lite$Optimize.DEFAULT)
tflite_model <- converter$convert()

writeBin(tflite_model, "model.tflite")

Summary

Keras in R provides a clean interface to TensorFlow for building deep learning models:

  • Sequential API, Simple, linear stack of layers for most use cases
  • Functional API, Build complex architectures with multiple inputs, outputs, and shared layers
  • Training, Comprehensive fit() with callbacks for early stopping, learning rate scheduling, and checkpointing
  • Evaluation, Multiple metrics, custom training loops, and prediction generation
  • Deployment, Save in various formats, export to TensorFlow Lite for edge devices

The R interface closely mirrors the Python Keras API, making it straightforward to adapt Python tutorials or code. The main differences are R’s pipe operator (|>) for chaining layers and R’s 1-based indexing considerations when working with array shapes.

For production deployment, consider using the TensorFlow Serving system for scalable model serving, or export to TensorFlow Lite for mobile and embedded applications.

When choosing between Keras and other deep learning frameworks in R, consider the trade-offs. The torch package provides a PyTorch backend with a different API philosophy that some R users prefer. For teams working primarily in R, Keras avoids the overhead of switching to Python for model development, though the Python community has far more tutorials and pre-trained models. Gradient boosting libraries like XGBoost and LightGBM remain the better choice for most tabular data problems, where they outperform neural networks with far less tuning effort.

Transfer learning from pre-trained models is well-supported in R Keras. The keras3 package can load weights from Python-trained models, and the application modules provide pre-built architectures like VGG16, ResNet, and EfficientNet with ImageNet weights. This is valuable for image classification tasks where training from scratch would require massive datasets and compute resources.

See also