Keras in R
Keras provides a user-friendly interface to TensorFlow for building and training deep learning models in R. This guide covers installation, model construction using both Sequential and Functional APIs, training workflows, evaluation, and saving deployed models.
Installation
Install the keras package from CRAN. This will automatically install TensorFlow as a dependency:
install.packages("keras")
library(keras)
# Install TensorFlow (creates a Python environment)
install_tensorflow()
For GPU support, install the GPU version of TensorFlow:
install_tensorflow(version = "gpu")
The installation process downloads and configures a Python environment with TensorFlow. This may take several minutes on first run. You can verify the installation:
library(keras)
tensorflow::tf$constant("Hello, Keras!") # Returns: tf.Tensor(b'Hello, Keras!', shape=(), dtype=string)
Building Models
Keras in R supports two primary APIs for model construction: the Sequential API for simple, linear stacks of layers, and the Functional API for complex architectures with multiple inputs or outputs.
Sequential API
The Sequential API provides a straightforward way to build models as a linear stack of layers. This is ideal for most use cases:
library(keras)
model <- keras_model_sequential() |>
layer_dense(units = 256, activation = "relu", input_shape = c(784)) |>
layer_dropout(0.2) |>
layer_dense(units = 128, activation = "relu") |>
layer_dropout(0.2) |>
layer_dense(units = 10, activation = "softmax")
model |> summary()
# Model: "sequential"
# ________________________________________________________________
# Layer (type) Output Shape Param #
# =================================================================
# dense (Dense) (None, 256) 200960
# dropout (Dropout) (None, 256) 0
# dense_1 (Dense) (None, 128) 32896
# dropout_1 (Dropout) (None, 128) 0
# dense_2 (Dense) (None, 10) 1290
# =================================================================
# Total params: 235,146
# Trainable params: 235,146
# Non-trainable params: 0
This creates a feedforward network with 784 inputs (flattened MNIST images), two hidden layers with ReLU activation and dropout for regularization, and 10 outputs for digit classification.
Functional API
The Functional API provides more flexibility for complex architectures. It allows you to build models with shared layers, multiple inputs, or multiple outputs:
# Define inputs
inputs <- layer_input(shape = c(784))
outputs <- inputs |>
layer_dense(256, activation = "relu") |>
layer_dropout(0.2) |>
layer_dense(128, activation = "relu") |>
layer_dropout(0.2) |>
layer_dense(10, activation = "softmax")
# Create model
model <- keras_model(inputs = inputs, outputs = outputs)
A more advanced example: a model with multiple inputs (considering both image data and metadata):
# Image input branch
image_input <- layer_input(shape = c(224, 224, 3))
image_features <- image_input |>
layer_conv_2d(32, c(3, 3), activation = "relu") |>
layer_max_pooling_2d(c(2, 2)) |>
layer_flatten()
# Metadata input branch
metadata_input <- layer_input(shape = c(10))
metadata_features <- metadata_input |>
layer_dense(32, activation = "relu")
# Concatenate and output
combined <- layer_concatenate(list(image_features, metadata_features)) |>
layer_dense(64, activation = "relu") |>
layer_dense(1, activation = "sigmoid")
model <- keras_model(
inputs = list(image_input, metadata_input),
outputs = combined
)
Common Layer Types
| Layer | Description |
|---|---|
layer_dense(units, activation) | Fully connected layer |
layer_conv_2d(filters, kernel_size, activation) | 2D convolution for images |
layer_conv_1d(filters, kernel_size, activation) | 1D convolution for sequences |
layer_lstm(units) | Long Short-Term Memory layer |
layer_gru(units) | Gated Recurrent Unit layer |
layer_embedding(input_dim, output_dim) | Word embedding lookup |
layer_dropout(rate) | Dropout regularization |
layer_batch_normalization() | Batch normalization |
layer_max_pooling_2d(pool_size) | Max pooling for downsampling |
layer_average_pooling_2d(pool_size) | Average pooling |
layer_flatten() | Flatten for dense layers |
layer_dense(units, activation) | Output layer with softmax/sigmoid |
Compiling Models
Before training, compile the model by specifying the optimizer, loss function, and metrics to track:
model |> compile(
optimizer = optimizer_adam(learning_rate = 0.001),
loss = "categorical_crossentropy",
metrics = c("accuracy", "mae")
)
Optimizers
Keras provides several optimizers. Adam is typically a good starting point:
# Adam (adaptive learning rate)
optimizer_adam(learning_rate = 0.001)
# AdamW (Adam with weight decay)
optimizer_adamw(learning_rate = 0.001, weight_decay = 0.01)
# Stochastic Gradient Descent with momentum
optimizer_sgd(learning_rate = 0.01, momentum = 0.9)
# RMSprop
optimizer_rmsprop(learning_rate = 0.001)
Loss Functions
Choose the appropriate loss based on your task:
# Classification
loss = "categorical_crossentropy" # Multi-class
loss = "binary_crossentropy" # Binary classification
# Regression
loss = "mse" # Mean Squared Error
loss = "mae" # Mean Absolute Error
loss = "huber" # Robust to outliers
Training Models
Train models using the fit() function. This handles the entire training loop:
# Prepare data (example with MNIST-like data)
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# One-hot encode labels
y_train <- to_categorical(y_train, num_classes = 10)
y_test <- to_categorical(y_test, num_classes = 10)
# Train the model
history <- model |> fit(
x_train, y_train,
epochs = 30,
batch_size = 128,
validation_split = 0.2,
callbacks = list(
callback_early_stopping(patience = 5, restore_best_weights = TRUE),
callback_reduce_lr_on_plateau(patience = 3, factor = 0.5)
)
)
# Output example:
# Epoch 1/30
# 563/563 [==============================] - 5s - loss: 0.45 - accuracy: 0.85 - val_loss: 0.32 - val_accuracy: 0.89
# ...
# Epoch 30/30
# 563/563 [==============================] - 4s - loss: 0.12 - accuracy: 0.96 - val_loss: 0.25 - val_accuracy: 0.93
Training Callbacks
Callbacks provide hooks to customize the training process:
callbacks <- list(
# Stop early if validation loss stops improving
callback_early_stopping(
monitor = "val_loss",
patience = 5,
restore_best_weights = TRUE
),
# Reduce learning rate when training stagnates
callback_reduce_lr_on_plateau(
monitor = "val_loss",
factor = 0.5,
patience = 3,
min_lr = 1e-6
),
# Save model checkpoints
callback_model_checkpoint(
"best_model.keras",
monitor = "val_accuracy",
save_best_only = TRUE
),
# Stop training at target accuracy
callback_lambda(
on_epoch_end = function(epoch, logs) {
if (logs$val_accuracy > 0.98) {
cat("\nReached 98% validation accuracy. Stopping...")
callback_lambda_stop_training()
}
}
)
)
Custom Training Loop
For maximum control, you can implement a custom training loop using tf$GradientTape:
# Custom training loop
optimizer <- optimizer_adam(0.001)
loss_fn <- loss_categorical_crossentropy()
train_step <- function(x, y) {
with(tf$GradientTape() %as% tape, {
predictions <- model(x, training = TRUE)
loss <- loss_fn(y, predictions)
})
gradients <- tape$gradient(loss, model$trainable_variables)
optimizer$apply_gradients(zip_lists(gradients, model$trainable_variables))
loss
}
# Run custom training loop
for (epoch in 1:epochs) {
cat("Epoch", epoch, "\n")
# Train over batches
for (i in seq(1, nrow(x_train), batch_size)) {
batch_end <- min(i + batch_size - 1, nrow(x_train))
x_batch <- x_train[i:batch_end,]
y_batch <- y_train[i:batch_end,]
loss <- train_step(x_batch, y_batch)
}
cat("Loss:", as.numeric(loss), "\n")
}
Evaluation and Prediction
After training, evaluate model performance on test data and generate predictions:
# Evaluate on test set
results <- model |> evaluate(x_test, y_test, verbose = 0)
cat("Test Loss:", results[1], "\n")
# Test Loss: 0.23
cat("Test Accuracy:", results[2], "\n")
# Test Accuracy: 0.92
# Generate predictions
predictions <- model |> predict(x_test)
head(predictions)
# [,1] [,2] [,3] [,4] [,5] [,6] ...
# [1,] 0.0012345 0.0009876 0.1234567 0.8523456 0.0123456 0.0098765 ...
# [2,] 0.8765432 0.0234567 0.0123456 0.0345678 0.0323456 0.0212345 ...
# Get predicted classes
predicted_classes <- apply(predictions, 1, which.max) - 1
Model Metrics
Track multiple metrics during training and evaluation:
model |> compile(
optimizer = optimizer_adam(0.001),
loss = "categorical_crossentropy",
metrics = c(
"accuracy",
metric_auc(name = "auc"),
metric_precision(name = "precision"),
metric_recall(name = "recall")
)
)
# Results include all specified metrics
results <- model |> evaluate(x_test, y_test)
# $loss
# [1] 0.23
# $accuracy
# [1] 0.92
# $auc
# [1] 0.98
# $precision
# [1] 0.91
# $recall
# [1] 0.90
Saving and Loading Models
Keras provides multiple formats for saving models:
# Save entire model (architecture + weights + optimizer state)
model |> save_model_tf("model_tf/") # SavedModel format
model |> save_model_hdf5("model.keras") # Keras format
# Save only weights (more portable)
model |> save_weights("model_weights.h5")
# Load model
loaded_model <- load_model_hdf5("model.keras")
# Load weights into model with matching architecture
model |> load_weights("model_weights.h5")
Export to TensorFlow Lite
For deployment on mobile or edge devices, convert to TensorFlow Lite format:
converter <- tf$lite$TFLiteConverterFromSavedModel("model_tf/")
converter$optimizations <- list(tf$lite$Optimize.DEFAULT)
tflite_model <- converter$convert()
writeBin(tflite_model, "model.tflite")
Summary
Keras in R provides a clean interface to TensorFlow for building deep learning models:
- Sequential API — Simple, linear stack of layers for most use cases
- Functional API — Build complex architectures with multiple inputs, outputs, and shared layers
- Training — Comprehensive fit() with callbacks for early stopping, learning rate scheduling, and checkpointing
- Evaluation — Multiple metrics, custom training loops, and prediction generation
- Deployment — Save in various formats, export to TensorFlow Lite for edge devices
The R interface closely mirrors the Python Keras API, making it straightforward to adapt Python tutorials or code. The main differences are R’s pipe operator (|>) for chaining layers and R’s 1-based indexing considerations when working with array shapes.
For production deployment, consider using the TensorFlow Serving system for scalable model serving, or export to TensorFlow Lite for mobile and embedded applications.