rguides

Building a REST Client in R

In the previous tutorials of this series, you learned how to make HTTP requests with httr2 and work with APIs. This tutorial goes deeper into building a production-ready REST client, a reusable abstraction that handles authentication, retries, pagination, and error handling gracefully.

By the end, you’ll have a client that’s reliable enough for real-world data pipelines.

What you’ll learn

This tutorial covers the key concepts and practical techniques for working with Building a REST Client in R. By the end, you will know how to apply the core functions in real data analysis workflows.

Why build a REST client?

When you’re just getting started with an API, direct function calls work fine:

resp <- request("https://api.example.com/data") |> 
  req_perform()

But as your usage grows, you’ll encounter challenges:

  • Authentication tokens expire and need refreshing
  • Rate limits require exponential backoff
  • Large datasets come in pages
  • Network failures happen, your code should handle them
  • Different endpoints need different configurations

A well-structured REST client encapsulates all this complexity behind a clean interface. Instead of repeating authentication logic and error handling in every function call, you build it once and reuse it.

Designing your client

Let’s build a client for a hypothetical JSON API. The patterns apply to any REST API.

Step 1: create a client function

The core of your client is a function that initializes a request with defaults:

library(httr2)

create_client <- function(base_url, api_key = NULL) {
  req <- request(base_url)
  
  if (!is.null(api_key)) {
    req <- req |> req_headers("Authorization" = paste("Bearer", api_key))
  }
  
  req
}

This creates a request template you can modify for each call. You start with a base configuration and add specifics as needed.

Step 2: add error handling

HTTR2 makes error handling elegant with req_error():

safe_request <- function(req) {
  req |>
    req_error(is_error = ~ TRUE) |>
    req_perform() |>
    resp_check_status()
}

The is_error predicate returns TRUE for any 4xx or 5xx response, turning them into R errors with meaningful messages. By default, HTTR2 only throws errors for 5xx server errors; this makes it treat client errors (like 404 or 401) the same way.

You can also customize error handling for specific status codes:

handle_not_found <- function(req) {
  req |>
    req_error(status_code = ~ .x == 404, body = ~ "Resource not found") |>
    req_perform()
}

Step 3: implement automatic retries

Network failures happen. Use req_retry() for resilience:

robust_request <- function(req, max_retries = 3) {
  req |>
    req_retry(
      max_tries = max_retries,
      backoff = ~ exp(.x) * 0.5,  # Exponential backoff
      is_transient = ~ resp_status(.x) >= 500
    ) |>
    req_perform()
}

The backoff formula starts at 0.5 seconds and doubles with each retry: 0.5s, 1s, 2s, 4s. The is_transient function tells HTTR2 which responses should trigger a retry, here, any 5xx server error.

You can also retry on rate limiting (429) with a longer backoff:

rate_limited_request <- function(req) {
  req |>
    req_retry(
      max_tries = 5,
      backoff = ~ if (resp_status(.x) == 429) 60 else exp(.x) * 0.5,
      is_transient = ~ resp_status(.x) >= 500 || resp_status(.x) == 429
    ) |>
    req_perform()
}

Step 4: handle pagination

Many APIs return paginated results. Here’s a pattern for collecting all pages:

fetch_all_pages <- function(client, endpoint) {
  all_results <- list()
  page <- 1
  has_more <- TRUE
  
  while (has_more) {
    resp <- client |>
      req_url_path(endpoint) |>
      req_url_query(page = page, per_page = 100) |>
      robust_request()
    
    data <- resp_body_json(resp)
    all_results <- c(all_results, data$items)
    
    has_more <- !is.null(data$next_page)
    page <- page + 1
  }
  
  all_results
}

Different APIs use different pagination schemes. Common patterns include:

  • Offset-based: ?page=2&per_page=50
  • Cursor-based: ?cursor=abc123
  • Link headers: Check Link header for next relation

Adapt the pattern to match your API’s response format.

Step 5: token refreshing

OAuth tokens expire. Build automatic refresh into your client:

create_oauth_client <- function(base_url, client_id, client_secret) {
  # Initial token fetch
  token_resp <- request(base_url) |>
    req_url_path("oauth/token") |>
    req_method("POST") |>
    req_body_form(
      grant_type = "client_credentials",
      client_id = client_id,
      client_secret = client_secret
    ) |>
    req_perform() |>
    resp_body_json()
  
  token <- token_resp$access_token
  expires_at <- Sys.time() + token_resp$expires_in
  
  # Return a function that handles automatic refresh
  function(endpoint, ...) {
    if (Sys.time() > expires_at) {
      # Token expired — refresh it
      token_resp <- request(base_url) |>
        req_url_path("oauth/token") |>
        req_method("POST") |>
        req_body_form(
          grant_type = "client_credentials",
          client_id = client_id,
          client_secret = client_secret
        ) |>
        req_perform() |>
        resp_body_json()
      
      token <<- token_resp$access_token
      expires_at <<- Sys.time() + token_resp$expires_in
    }
    
    request(base_url) |>
      req_url_path(endpoint) |>
      req_headers("Authorization" = paste("Bearer", token)) |>
      robust_request()
  }
}

The <<- operator updates the token and expiry time in the parent environment. Each call checks if the token is still valid before making a request.

Adding timeouts

Production code should set timeouts to avoid hanging requests:

timed_request <- function(req, timeout = 30) {
  req |>
    req_timeout(timeout) |>
    req_perform()
}

Combine this with your retry logic for a reliable pipeline:

production_request <- function(req) {
  req |>
    req_timeout(30) |>
    req_retry(max_tries = 3, backoff = ~ exp(.x) * 0.5) |>
    req_error(is_error = ~ TRUE) |>
    req_perform()
}

Putting it all together

Here’s a complete example combining all patterns:

library(httr2)
library(purrr)

# Initialize client
api_call <- create_oauth_client(
  "https://api.example.com",
  Sys.getenv("CLIENT_ID"),
  Sys.getenv("CLIENT_SECRET")
)

# Fetch paginated data with automatic retries
fetch_users <- function() {
  fetch_all_pages(api_call, "v1/users")
}

# Fetch a single resource
get_user <- function(user_id) {
  api_call(paste0("v1/users/", user_id)) |>
    resp_body_json()
}

# Get data
users <- fetch_users()
user <- get_user(12345)

Common pitfalls

  • Forgetting to handle 404, Missing resources shouldn’t crash your pipeline
  • No retry on 429, Rate limits are transient; retry after the suggested delay
  • Hardcoding URLs, Use environment variables for base URLs to support staging and production
  • Ignoring response encoding, Some APIs return gzipped responses; HTTR2 handles this automatically

Constructing requests

httr2::request("https://api.example.com") creates a request object. req_url_path_append(req, "users", user_id) builds the URL path. req_url_query(req, page = 1, per_page = 100) adds query parameters. req_headers(req, "Accept" = "application/json") adds headers. req_body_json(req, list(key = "value")) adds a JSON body. All req_* functions return modified request objects, they do not send anything.

Error handling

resp_status(resp) returns the HTTP status code. resp_is_error(resp) checks for 4xx/5xx. req_error(req, body = function(resp) resp_body_json(resp)$message) customizes what error message appears when a request fails. req_retry(req, max_tries = 3) retries on transient server errors (429 Too Many Requests, 503 Service Unavailable). Combine: req |> req_retry(3) |> req_error(body = parse_error_fn) |> req_perform().

Rate limiting

APIs enforce rate limits to prevent abuse. req_throttle(req, rate = 60/60) limits to 60 requests per minute. For APIs that return rate limit headers (X-RateLimit-Remaining, Retry-After), req_retry() respects Retry-After automatically. For APIs without standard headers, add explicit Sys.sleep() calls between requests or implement a token bucket algorithm.

REST API design patterns

REST APIs use standard HTTP methods: GET (retrieve), POST (create), PUT/PATCH (update), DELETE (remove). Resources are identified by URLs. Responses use HTTP status codes to communicate outcomes.

A well-designed R API client wraps these HTTP calls into domain-specific functions. get_user(id) calls GET /users/{id}. create_order(data) calls POST /orders. This encapsulation isolates HTTP details from business logic, making tests easier to write and the API easier to use.

httr2 is the modern package for HTTP clients. request(base_url) %>% req_url_path_append("/users") %>% req_url_query(page = 1) %>% req_auth_bearer_token(token) %>% req_perform() constructs and executes a request. Break down the chain into named functions for reuse.

Client architecture

A clean client uses a constructor to set shared configuration (base URL, authentication) and methods for each endpoint:

new_api_client <- function(base_url, api_key) {
  env <- new.env()
  env$base_request <- httr2::request(base_url) %>%
    httr2::req_headers("X-API-Key" = api_key)
  
  env$get_items <- function(page = 1) {
    env$base_request %>%
      httr2::req_url_path_append("/items") %>%
      httr2::req_url_query(page = page) %>%
      httr2::req_perform() %>%
      httr2::resp_body_json()
  }
  env
}

This pattern, a closure-based object with shared base request, scales to many endpoints without repetition. Authentication, base URL, and common headers are set once.

Pagination

Most APIs paginate large collections. Three common patterns: offset pagination (?page=1&per_page=100), cursor pagination (?cursor=abc123&limit=100), and link-header pagination (the response includes a Link: <url>; rel="next" header).

For offset pagination:

fetch_all <- function(client, endpoint) {
  page <- 1
  all_results <- list()
  repeat {
    response <- client$get_items(page = page)
    if (length(response$data) == 0) break
    all_results <- c(all_results, response$data)
    page <- page + 1
  }
  bind_rows(all_results)
}

httr2::req_perform_iteratively() handles pagination automatically when you provide an iteration function that extracts the next page URL from a response.

Authentication patterns

API Key: add as header (X-API-Key: key) or query parameter (?api_key=key). Store in environment variable, never in code: Sys.getenv("API_KEY").

Bearer tokens: req_auth_bearer_token(token) sets Authorization: Bearer token. For OAuth 2.0 flows: req_oauth_auth_code() opens a browser for authorization and handles token storage and refresh.

HTTP Basic: req_auth_basic(username, password) for APIs using basic authentication. Password should come from Sys.getenv() or keyring::key_get(), not hardcoded.

Handling rate limits and errors

req_retry(max_tries = 3, is_transient = resp_is_error) retries transient failures. req_throttle(rate = 10 / 60) limits to 10 requests per minute. Combine both for a reliable client.

resp_check_status(resp) throws an error for 4xx/5xx responses. Handle specific status codes with class-specific catchers: tryCatch(req_perform(req), httr2_http_429 = function(e) { wait_and_retry() }).

For APIs that return errors in the body (not HTTP status): body <- resp_body_json(resp); if (!is.null(body$error)) stop(body$error$message).

Best practices

  1. Store credentials in environment variables, not in your code. Use Sys.getenv() to retrieve them at runtime.

  2. Log failures with timestamps for debugging. Wrap requests in tryCatch() to capture details:

try_fetch <- function(req) {
  tryCatch(
    robust_request(req),
    error = function(e) {
      message("Request failed: ", e$message)
      NULL
    }
  )
}
  1. Set timeouts with req_timeout() to avoid hanging requests on slow or unresponsive APIs.

  2. Test with mocked responses using the httptest2 package. Mock the API responses during testing to avoid hitting rate limits and ensure consistent test results.

  3. Version your client as the API changes. Keep client code in a separate package or module to manage breaking changes cleanly.

Rate limiting and retries

APIs enforce rate limits to prevent abuse. When a request returns HTTP 429 (Too Many Requests), the response typically includes a Retry-After header specifying how many seconds to wait. httr2’s req_retry() handles this automatically: req_retry(req, max_tries = 3, backoff = ~ 2^.x) retries up to three times with exponential backoff. For APIs without proper Retry-After headers, Sys.sleep() between requests respects rate limits manually. Structure batch request loops to check the elapsed time and pause as needed rather than sleeping after every request.

Next steps

Now that you understand building a rest client in r, explore these related topics to deepen your knowledge and apply these techniques in more complex scenarios.

See also