rguides

Time Series Forecasting with fable

Time series forecasting is a fundamental skill for anyone working with temporal data. The fable package provides a tidy interface for forecasting, built on the principles of the tidyverse. This guide walks you through the complete workflow from data preparation to model evaluation.

Why fable?

The fable package is part of the tidyverts ecosystem, which includes tsibble for time series data structures and fabletools for modelling. What makes fable stand out is its consistent API that feels familiar if you’ve used dplyr before. You pipe data through model specifications and forecasting functions without jumping between different syntaxes.

Installation and setup

First, install the required packages from CRAN:

install.packages("fable")
install.packages("fabletools")
install.packages("tsibble")
install.packages("lubridate")

Load the libraries:

library(fable)
library(fabletools)
library(tsibble)
library(lubridate)
library(dplyr)
library(ggplot2)

Preparing time series data

The fable ecosystem works with tsibble objects - tidy time series that extend tibbles with an index variable. Here’s how to convert your data:

# Create a tsibble from a data frame
my_tsibble <- my_data %>%
  as_tsibble(index = date_col)

# Or create from scratch for examples
sample_data <- tibble(
  date = seq(as.Date("2020-01-01"), by = "month", length.out = 36),
  value = cumsum(rnorm(36, mean = 2, sd = 1))
) %>% 
  as_tsibble(index = date)

The key requirement is having a properly defined index column that represents time. Tsibble handles various frequency types automatically, but you can specify it explicitly if needed.

Fitting forecasting models

Fable provides several model families. Here are the most common approaches:

Exponential smoothing (ETS)

Exponential smoothing methods are popular for their simplicity and effectiveness. The ETS model automatically selects error, trend, and seasonality components:

ets_model <- sample_data %>%
  model(ets = ETS(value ~ trend("A") + season("M")))

ets_model

The formula specifies additive trend and multiplicative seasonality. Other options include “Ad” for additive damped trend, “M” for multiplicative error, and “N” for none.

ARIMA

Auto-Regressive Integrated Moving Average models capture autocorrelation patterns. Fable’s ARIMA function automatically selects the best parameters:

arima_model <- sample_data %>%
  model(arima = ARIMA(value))

arima_model

You can also specify ARIMA orders manually if you have domain knowledge about your data.

Prophet (via fable.prophet)

For data with strong seasonal patterns and holiday effects, Facebook’s Prophet model is excellent:

# Install if needed: install.packages("fable.prophet")
library(fable.prophet)

prophet_model <- sample_data %>%
  model(prophet = prophet(value ~ season("yearly")))

Generating forecasts

Once you have a fitted model, generate forecasts with the forecast() function:

# Forecast 12 periods ahead
forecast_ets <- ets_model %>%
  forecast(h = "12 months")

forecast_arima <- arima_model %>%
  forecast(h = 12)

# View the forecasts
forecast_ets %>%
  autoplot(sample_data)

The h argument accepts various specifications including “12 months”, 12 (meaning 12 periods based on your data frequency), or “2 years”.

Model evaluation

Cross-validation is essential for reliable forecasts. Use stretch_tsibble() to create expanding training windows:

# Time series cross-validation
cv_results <- sample_data %>%
  stretch_tsibble(.init = 12, .step = 1) %>%
  model(ets = ETS(value ~ trend("A"))) %>%
  forecast(h = 1)

# Calculate accuracy metrics
accuracy <- cv_results %>%
  accuracy(sample_data, by = ".id")

accuracy %>% 
  summarise(
    mae = mean(MAE),
    rmse = mean(RMSE),
    mape = mean(MAPE)
  )

This creates 24 different training sets starting with 12 observations, fitting a model to each, and forecasting one step ahead.

Working with multiple models

Compare models easily by fitting multiple specifications at once:

# Fit multiple models at once
comparison <- sample_data %>%
  model(
    ets = ETS(value ~ trend("A")),
    arima = ARIMA(value),
    naive = NAIVE(value),
    drift = RW(value ~ drift())
  )

# Compare forecasts
comparison %>%
  forecast(h = "12 months") %>%
  autoplot(sample_data)

This lets you visually compare how different approaches perform on your specific data.

Practical example: retail sales

Putting it all together with a realistic example:

# Example with realistic data structure
retail_data <- tsibble(
  month = seq(as.Date("2018-01-01"), by = "month", length.out = 36),
  sales = cumsum(rnorm(36, 100, 20)) + 500
)

# Fit model and forecast
retail_model <- retail_data %>%
  model(ets = ETS(sales ~ trend("Ad") + season("M")))

retail_forecast <- retail_model %>%
  forecast(h = 12)

# Visualize with prediction intervals
retail_forecast %>%
  autoplot(retail_data) +
  labs(title = "Sales Forecast", y = "Sales ($)")

The autoplot function automatically shows prediction intervals, which are crucial for understanding forecast uncertainty.

ARIMA models

arima() in base R and auto.arima() in forecast fit autoregressive integrated moving average models. auto.arima() tests multiple (p,d,q) combinations and selects the best by AIC. The order parameters: p is the autoregressive lag, d is the number of differences needed for stationarity, q is the moving average lag. Seasonal ARIMA adds (P,D,Q)m parameters for the seasonal cycle.

Exponential smoothing

Exponential smoothing models weight recent observations more heavily than older ones. ets() in the forecast package automatically selects the error, trend, and seasonal components. Holt-Winters triple exponential smoothing handles series with trend and seasonality. For simple series without trend or seasonality, simple exponential smoothing (SES) with a single smoothing parameter is sufficient. alpha controls the smoothing level, values near 1 give heavy weight to recent observations; values near 0 smooth heavily.

Time series data structures

R provides multiple time series classes. ts(data, start = c(2020, 1), frequency = 12) creates a monthly time series starting January 2020. frequency = 4 for quarterly; frequency = 52 for weekly. The ts class supports specialized plot methods and seasonal decomposition.

zoo::zoo(data, order.by = dates) handles irregular time series with arbitrary date indices. xts::xts(data, order.by = dates) extends zoo with additional financial time series functionality. Both work with POSIXct, Date, and numeric time indices.

The tsibble package provides a tidy time series data frame. tsibble(data, key = id, index = date) creates a tsibble where each combination of key columns uniquely identifies a time series, and index is the time column. Multiple time series (by product, region, etc.) are stored in a single long-format data frame with the key identifying each series.

Decomposition

Seasonal decomposition separates a time series into trend, seasonality, and remainder. decompose(ts_obj) uses additive or multiplicative decomposition. stl(ts_obj, s.window = "periodic") uses the STL (Seasonal and Trend decomposition using Loess) method, which handles varying seasonality and is reliable to outliers.

plot(decompose(ts_obj)) visualizes the decomposition. Examining the remainder (residual) component helps identify anomalies and assess how well the decomposition captured the signal.

For multiple time series, feasts::STL(ts_model, reliable = TRUE) from the fabletools ecosystem applies STL decomposition to all series in a tsibble simultaneously.

Forecasting methods

forecast::auto.arima(ts_obj) automatically selects ARIMA order using information criteria. It handles differencing for stationarity, seasonal components, and model selection. forecast::forecast(model, h = 12) produces 12-period ahead forecasts with prediction intervals.

forecast::ets(ts_obj) fits exponential smoothing state space models, a family of models based on weighted averages of past observations. ETS handles trends and seasonality without requiring stationarity.

prophet::prophet(df, yearly.seasonality = TRUE) from Meta’s Prophet library handles daily data with strong multi-seasonality (yearly, weekly, holiday effects). It requires a data frame with ds (date) and y (value) columns. Prophet is more accessible for non-statisticians because its parameters correspond to intuitive concepts (trend flexibility, seasonality strength).

Tidymodels time series

The modeltime package brings tidymodels to time series. Define models with arima_reg(), prophet_reg(), prophet_boost(), exp_smoothing(), and combine them into an ensemble. modeltime_calibrate(), modeltime_accuracy(), and modeltime_forecast() provide a consistent workflow.

timetk::time_series_split(df, date_var = date, assess = "3 months") creates train/test splits for time series, respecting temporal order. rsample::rolling_origin() creates multiple rolling train/test windows for cross-validation.

timetk::plot_time_series(df, date, value, .smooth = TRUE) visualizes time series with an optional LOESS smoother. timetk::plot_acf_diagnostics(df, date, value) produces ACF and PACF plots for identifying ARIMA order.

Evaluating forecast accuracy

forecast::accuracy(forecast_obj, test_data) computes RMSE, MAE, MAPE, and MASE on a held-out test set. The Mean Absolute Scaled Error (MASE) normalizes by the naive forecast performance, making it comparable across series with different scales.

For multiple models, modeltime::modeltime_accuracy(calibrated_models) returns metrics for all models in a formatted table. dplyr::slice_min(mae, n = 1) selects the best model by MAE.

Prediction interval coverage: a 95% prediction interval should contain the actual value 95% of the time. If your intervals are too narrow (under-coverage), the model underestimates uncertainty. cover_rate <- mean(actuals >= lower & actuals <= upper) computes empirical coverage.

Handling missing values in time series

Time series data frequently has gaps from missing observations, sensor failures, or data collection irregularities. The imputeTS package provides methods tailored for time series: na_interpolation() fills gaps with linear or spline interpolation, na_kalman() uses a state-space model, and na_seadec() handles seasonal decomposition before imputation. The right method depends on the gap mechanism — short random gaps suit interpolation; long structured gaps may need seasonal decomposition. Always visualize the imputed series against the original to verify the fill values are plausible.

Best practices

When working with fable, keep these tips in mind:

  1. Check your data frequency - Use is_weekly(), is_monthly() to verify your tsibble is properly indexed
  2. Start simple - Try naive forecasts first as a baseline before complex models
  3. Validate properly - Use time series cross-validation, not random splits
  4. Check residuals - Always examine model residuals for patterns that your model missed

See also