Keras R: Build Deep Learning Models with TensorFlow in R
The Keras R package provides a user-friendly interface to TensorFlow for building and training deep learning models directly from R. You can construct neural networks using both Sequential and Functional APIs without writing Python code, and the R pipe operator (|>) makes layer definitions feel natural in an R workflow. This guide covers installation, model construction, training workflows, evaluation, and saving deployed models.
Installation
Install the keras package from CRAN. This will automatically install TensorFlow as a dependency:
install.packages("keras")
library(keras)
# Install TensorFlow (creates a Python environment)
install_tensorflow()
The standard installation downloads the CPU-only version of TensorFlow, which works everywhere but trains slowly on large models. For GPU acceleration, install the GPU-enabled version instead. You will need an NVIDIA GPU with CUDA drivers installed on your system, and the GPU build compiles kernels that can be orders of magnitude faster for matrix-heavy operations:
install_tensorflow(version = "gpu")
The installation process downloads and configures a Python environment with TensorFlow and its dependencies. This may take several minutes on first run as it fetches the Python runtime, TensorFlow wheels, and supporting libraries. Once the environment is ready, you can verify the installation by creating a TensorFlow constant and inspecting it:
library(keras)
tensorflow::tf$constant("Hello, Keras!") # Returns: tf.Tensor(b'Hello, Keras!', shape=(), dtype=string)
Building models
Keras in R supports two primary APIs for model construction: the Sequential API for simple, linear stacks of layers, and the Functional API for complex architectures with multiple inputs or outputs.
Sequential API
The Sequential API provides a straightforward way to build models as a linear stack of layers. This is ideal for most use cases:
library(keras)
model <- keras_model_sequential() |>
layer_dense(units = 256, activation = "relu", input_shape = c(784)) |>
layer_dropout(0.2) |>
layer_dense(units = 128, activation = "relu") |>
layer_dropout(0.2) |>
layer_dense(units = 10, activation = "softmax")
model |> summary()
# Model: "sequential"
# ________________________________________________________________
# Layer (type) Output Shape Param #
# =================================================================
# dense (Dense) (None, 256) 200960
# dropout (Dropout) (None, 256) 0
# dense_1 (Dense) (None, 128) 32896
# dropout_1 (Dropout) (None, 128) 0
# dense_2 (Dense) (None, 10) 1290
# =================================================================
# Total params: 235,146
# Trainable params: 235,146
# Non-trainable params: 0
This creates a feedforward network with 784 inputs (flattened MNIST images), two hidden layers with ReLU activation and dropout for regularization, and 10 outputs for digit classification. The Sequential API is the right choice when each layer feeds exactly one downstream layer, which covers most standard architectures including feedforward networks, simple CNNs, and stacked RNNs.
Functional API
The Functional API provides more flexibility for complex architectures. Instead of stacking layers linearly, you define input tensors and chain layer calls explicitly, which lets you build models with shared layers, multiple inputs, or multiple outputs. This API also gives you fine-grained control over the tensor shapes flowing through each branch of your model:
# Define inputs
inputs <- layer_input(shape = c(784))
outputs <- inputs |>
layer_dense(256, activation = "relu") |>
layer_dropout(0.2) |>
layer_dense(128, activation = "relu") |>
layer_dropout(0.2) |>
layer_dense(10, activation = "softmax")
# Create model
model <- keras_model(inputs = inputs, outputs = outputs)
A more advanced example: a model with multiple inputs (considering both image data and metadata). The Functional API shines when your model needs to process different input types through separate branches and then merge them. For instance, you might combine image features extracted by a convolutional branch with tabular metadata processed through dense layers. The layer_concatenate() function joins the branches into a single tensor that feeds the final classification head:
# Image input branch
image_input <- layer_input(shape = c(224, 224, 3))
image_features <- image_input |>
layer_conv_2d(32, c(3, 3), activation = "relu") |>
layer_max_pooling_2d(c(2, 2)) |>
layer_flatten()
# Metadata input branch
metadata_input <- layer_input(shape = c(10))
metadata_features <- metadata_input |>
layer_dense(32, activation = "relu")
# Concatenate and output
combined <- layer_concatenate(list(image_features, metadata_features)) |>
layer_dense(64, activation = "relu") |>
layer_dense(1, activation = "sigmoid")
model <- keras_model(
inputs = list(image_input, metadata_input),
outputs = combined
)
Common layer types
Keras provides a wide range of layer types beyond the dense and convolutional layers shown above. Choosing the right layer for your data type is essential: convolutional layers for spatial patterns in images, recurrent layers for sequential dependencies in time series, and embedding layers for high-cardinality categorical variables. The table below lists the most commonly used layers and what each one does:
| Layer | Description |
|---|---|
layer_dense(units, activation) | Fully connected layer |
layer_conv_2d(filters, kernel_size, activation) | 2D convolution for images |
layer_conv_1d(filters, kernel_size, activation) | 1D convolution for sequences |
layer_lstm(units) | Long Short-Term Memory layer |
layer_gru(units) | Gated Recurrent Unit layer |
layer_embedding(input_dim, output_dim) | Word embedding lookup |
layer_dropout(rate) | Dropout regularization |
layer_batch_normalization() | Batch normalization |
layer_max_pooling_2d(pool_size) | Max pooling for downsampling |
layer_average_pooling_2d(pool_size) | Average pooling |
layer_flatten() | Flatten for dense layers |
layer_dense(units, activation) | Output layer with softmax/sigmoid |
Compiling models
Before training, compile the model by specifying the optimizer, loss function, and metrics to track. Compiling finalizes the model’s training configuration: the optimizer controls how weights are updated, the loss function measures prediction error, and the metrics list determines what gets reported during training. You can always recompile with different settings before calling fit():
model |> compile(
optimizer = optimizer_adam(learning_rate = 0.001),
loss = "categorical_crossentropy",
metrics = c("accuracy", "mae")
)
Optimizers
Keras provides several optimizers that implement different strategies for updating model weights during training. Adam is typically a good starting point because it combines momentum and adaptive learning rates, converging quickly without much tuning. AdamW adds weight decay regularization, which often improves generalization. SGD with momentum is simpler but can outperform adaptive methods when carefully tuned with a learning rate schedule. Here is how to specify each one:
# Adam (adaptive learning rate)
optimizer_adam(learning_rate = 0.001)
# AdamW (Adam with weight decay)
optimizer_adamw(learning_rate = 0.001, weight_decay = 0.01)
# Stochastic Gradient Descent with momentum
optimizer_sgd(learning_rate = 0.01, momentum = 0.9)
# RMSprop
optimizer_rmsprop(learning_rate = 0.001)
Loss functions
Choose the appropriate loss based on your task. The loss function quantifies how far your model’s predictions are from the true values, and the optimizer’s job is to minimise this quantity. Classification tasks use cross-entropy losses that measure the divergence between predicted probabilities and true labels. Regression tasks use distance-based losses that penalise absolute or squared errors. Here is a quick reference for matching loss functions to problem types:
# Classification
loss = "categorical_crossentropy" # Multi-class
loss = "binary_crossentropy" # Binary classification
# Regression
loss = "mse" # Mean Squared Error
loss = "mae" # Mean Absolute Error
loss = "huber" # Reliable to outliers
Training models
Train models using the fit() function. This handles the entire training loop: it shuffles data, feeds batches to the model, computes gradients, updates weights, and records loss and metrics for every epoch. The function returns a history object containing per-epoch values, which you can plot to diagnose overfitting or underfitting. Preprocessing your data into the right shape is critical before calling fit():
# Prepare data (example with MNIST-like data)
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# One-hot encode labels
y_train <- to_categorical(y_train, num_classes = 10)
y_test <- to_categorical(y_test, num_classes = 10)
# Train the model
history <- model |> fit(
x_train, y_train,
epochs = 30,
batch_size = 128,
validation_split = 0.2,
callbacks = list(
callback_early_stopping(patience = 5, restore_best_weights = TRUE),
callback_reduce_lr_on_plateau(patience = 3, factor = 0.5)
)
)
# Output example:
# Epoch 1/30
# 563/563 [==============================] - 5s - loss: 0.45 - accuracy: 0.85 - val_loss: 0.32 - val_accuracy: 0.89
# ...
# Epoch 30/30
# 563/563 [==============================] - 4s - loss: 0.12 - accuracy: 0.96 - val_loss: 0.25 - val_accuracy: 0.93
Training callbacks
Callbacks provide hooks to customize the training process at key moments: the start and end of each epoch, or the start and end of each batch. They let you implement early stopping, learning rate scheduling, model checkpointing, and custom stopping conditions without modifying the training loop itself. The callbacks below cover the most common training-automation patterns:
callbacks <- list(
# Stop early if validation loss stops improving
callback_early_stopping(
monitor = "val_loss",
patience = 5,
restore_best_weights = TRUE
),
# Reduce learning rate when training stagnates
callback_reduce_lr_on_plateau(
monitor = "val_loss",
factor = 0.5,
patience = 3,
min_lr = 1e-6
),
# Save model checkpoints
callback_model_checkpoint(
"best_model.keras",
monitor = "val_accuracy",
save_best_only = TRUE
),
# Stop training at target accuracy
callback_lambda(
on_epoch_end = function(epoch, logs) {
if (logs$val_accuracy > 0.98) {
cat("\nReached 98% validation accuracy. Stopping...")
callback_lambda_stop_training()
}
}
)
)
Custom training loop
For maximum control, you can implement a custom training loop using tf$GradientTape. This replaces the high-level fit() function with explicit gradient computation and weight updates, which is necessary when your training logic cannot be expressed through standard callbacks alone. Use a custom loop when you need per-batch logging, gradient clipping, or non-standard update rules that the built-in optimizers do not support:
# Custom training loop
optimizer <- optimizer_adam(0.001)
loss_fn <- loss_categorical_crossentropy()
train_step <- function(x, y) {
with(tf$GradientTape() %as% tape, {
predictions <- model(x, training = TRUE)
loss <- loss_fn(y, predictions)
})
gradients <- tape$gradient(loss, model$trainable_variables)
optimizer$apply_gradients(zip_lists(gradients, model$trainable_variables))
loss
}
# Run custom training loop
for (epoch in 1:epochs) {
cat("Epoch", epoch, "\n")
# Train over batches
for (i in seq(1, nrow(x_train), batch_size)) {
batch_end <- min(i + batch_size - 1, nrow(x_train))
x_batch <- x_train[i:batch_end,]
y_batch <- y_train[i:batch_end,]
loss <- train_step(x_batch, y_batch)
}
cat("Loss:", as.numeric(loss), "\n")
}
Evaluation and prediction
After training, evaluate model performance on test data and generate predictions. The evaluate() function runs a forward pass over the test set and computes loss plus all configured metrics, giving you an unbiased estimate of real-world performance. The predict() function returns raw probability outputs, and you convert these to class labels with which.max() on each row. Always evaluate on data the model never saw during training:
# Evaluate on test set
results <- model |> evaluate(x_test, y_test, verbose = 0)
cat("Test Loss:", results[1], "\n")
# Test Loss: 0.23
cat("Test Accuracy:", results[2], "\n")
# Test Accuracy: 0.92
# Generate predictions
predictions <- model |> predict(x_test)
head(predictions)
# [,1] [,2] [,3] [,4] [,5] [,6] ...
# [1,] 0.0012345 0.0009876 0.1234567 0.8523456 0.0123456 0.0098765 ...
# [2,] 0.8765432 0.0234567 0.0123456 0.0345678 0.0323456 0.0212345 ...
# Get predicted classes
predicted_classes <- apply(predictions, 1, which.max) - 1
Model metrics
Track multiple metrics during training and evaluation. Beyond accuracy, Keras supports precision, recall, AUC, and custom metric functions. Specifying multiple metrics helps you diagnose specific failure modes: a model might have high accuracy but low recall if it is ignoring a minority class, or high AUC but poor precision if it produces too many false positives. Add metrics to the compile call, and they appear in both evaluate() output and training logs:
model |> compile(
optimizer = optimizer_adam(0.001),
loss = "categorical_crossentropy",
metrics = c(
"accuracy",
metric_auc(name = "auc"),
metric_precision(name = "precision"),
metric_recall(name = "recall")
)
)
# Results include all specified metrics
results <- model |> evaluate(x_test, y_test)
# $loss
# [1] 0.23
# $accuracy
# [1] 0.92
# $auc
# [1] 0.98
# $precision
# [1] 0.91
# $recall
# [1] 0.90
Saving and loading models
Keras provides multiple formats for saving models, each with different trade-offs. The SavedModel format preserves the full model including architecture, weights, and optimizer state, making it suitable for resuming training or serving. The HDF5 format is a single file that is easier to transfer. Saving only the weights produces the smallest file, but you must recreate the identical model architecture before loading them. Here is how to use each format:
# Save entire model (architecture + weights + optimizer state)
model |> save_model_tf("model_tf/") # SavedModel format
model |> save_model_hdf5("model.keras") # Keras format
# Save only weights (more portable)
model |> save_weights("model_weights.h5")
# Load model
loaded_model <- load_model_hdf5("model.keras")
# Load weights into model with matching architecture
model |> load_weights("model_weights.h5")
Export to tensorFlow lite
For deployment on mobile or edge devices, convert to TensorFlow Lite format. TensorFlow Lite is a lightweight runtime optimized for ARM processors and low-power devices, and it applies optimisations like operator fusion and quantization to reduce model size and latency. The converter reads a SavedModel directory and produces a .tflite file that can be loaded from Android, iOS, or embedded Linux:
converter <- tf$lite$TFLiteConverterFromSavedModel("model_tf/")
converter$optimizations <- list(tf$lite$Optimize.DEFAULT)
tflite_model <- converter$convert()
writeBin(tflite_model, "model.tflite")
Summary
Keras in R provides a clean interface to TensorFlow for building deep learning models:
- Sequential API, Simple, linear stack of layers for most use cases
- Functional API, Build complex architectures with multiple inputs, outputs, and shared layers
- Training, Comprehensive fit() with callbacks for early stopping, learning rate scheduling, and checkpointing
- Evaluation, Multiple metrics, custom training loops, and prediction generation
- Deployment, Save in various formats, export to TensorFlow Lite for edge devices
The R interface closely mirrors the Python Keras API, making it straightforward to adapt Python tutorials or code. The main differences are R’s pipe operator (|>) for chaining layers and R’s 1-based indexing considerations when working with array shapes.
For production deployment, consider using the TensorFlow Serving system for scalable model serving, or export to TensorFlow Lite for mobile and embedded applications.
When choosing between Keras and other deep learning frameworks in R, consider the trade-offs. The torch package provides a PyTorch backend with a different API philosophy that some R users prefer. For teams working primarily in R, Keras avoids the overhead of switching to Python for model development, though the Python community has far more tutorials and pre-trained models. Gradient boosting libraries like XGBoost and LightGBM remain the better choice for most tabular data problems, where they outperform neural networks with far less tuning effort.
Transfer learning from pre-trained models is well-supported in R Keras. The keras3 package can load weights from Python-trained models, and the application modules provide pre-built architectures like VGG16, ResNet, and EfficientNet with ImageNet weights. This is valuable for image classification tasks where training from scratch would require massive datasets and compute resources.
See also
- R Torch Guide — alternative deep learning framework with a PyTorch backend
- R Machine Learning Introduction — broader ML workflow in R
- R ML Packages 2026 — overview of the R machine learning package ecosystem
- Building R Packages — packaging your trained models for distribution
- R Databases with DBI — storing and retrieving training data from databases