Working with JSON in R
JSON (JavaScript Object Notation) has become the standard format for data interchange across web APIs, configuration files, and data pipelines. R provides reliable tools for working with JSON through the jsonlite package, which is part of the tidyverse ecosystem but can be used independently.
Installing jsonlite
The jsonlite package is available on CRAN and can be installed with:
install.packages("jsonlite")
library(jsonlite)
Reading JSON into R
The primary function for reading JSON is fromJSON(). It accepts URLs, file paths, or character strings:
# From a URL (JSONPlaceholder API example)
users <- fromJSON("https://jsonplaceholder.typicode.com/users")
# From a local file
data <- fromJSON("data.json")
# From a character string
json_str <- '{"name": "Alice", "age": 30}'
person <- fromJSON(json_str)
The function automatically converts JSON arrays to R vectors and objects to data frames or lists depending on the structure.
Writing R objects to JSON
Use toJSON() to convert R objects to JSON format:
# Simple example
df <- data.frame(
name = c("Alice", "Bob"),
score = c(95, 87)
)
json_output <- toJSON(df, pretty = TRUE)
cat(json_output)
The pretty = TRUE argument formats the output with indentation for readability. Use auto_unbox = TRUE to convert single-element arrays to raw types.
Writing to files
# Write JSON to a file
write_json(df, "output.json", pretty = TRUE)
Working with nested JSON
Real-world JSON often has nested structures. The flatten() function simplifies these:
# Example: nested API response
nested <- fromJSON('{
"person": {
"name": "Alice",
"address": {"city": "Boston", "zip": "02101"}
}
}')
# Flatten to get address.city as a column
flat <- flatten(nested)
For complex nested structures, work with lists directly:
# Access nested elements
data <- fromJSON("complex_api.json")
city <- data$results[[1]]$address$city
Handling dates and times
JSON does not have a native date type. The POSIXct class stores dates with timezone information:
# Dates become ISO 8601 strings
df <- data.frame(
event = c("start", "end"),
timestamp = as.POSIXct(c("2024-01-15 09:00", "2024-01-15 17:00"))
)
toJSON(df, POSIXt = "ISO")
Error handling
When working with external APIs, handle potential errors. The safely function from purrr wraps any function to return a list with result and error components:
library(purrr)
safe_read <- safely(fromJSON, otherwise = NULL)
result <- safe_read("https://api.example.com/data")
if (is.null(result$error)) {
data <- result$result
} else {
message("Failed to fetch: ", result$error$message)
}
Practical example: weather API
Here is a complete example fetching weather data from an API:
library(jsonlite)
# Fetch weather data (example API)
url <- "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41¤t_weather=true"
weather <- fromJSON(url)
# Extract current temperature
current_temp <- weather$current_weather$temperature
message("Current temperature: ", current_temp, " degrees C")
Comparing jsonlite to alternatives
Different packages serve different purposes when working with JSON in R:
| Package | Use Case |
|---|---|
| jsonlite | General JSON handling, API consumption |
| httr2 | HTTP requests with JSON support |
| tidyjson | Tidyverse-style JSON manipulation |
| rapidjsonr | High-performance JSON for large files |
Performance tips
For large JSON files, consider these optimizations. Use streaming for files that do not fit in memory:
# Stream JSON from a file
con <- file("large_data.json", "r")
stream_in(con, function(df) {
# Process in chunks
print(nrow(df))
})
close(con)
Prettify only when needed, as it adds overhead:
# Fast serialization (no prettifying)
compact_json <- toJSON(df, pretty = FALSE)
Common pitfalls
Watch for these common issues when working with JSON in R. First, factor columns convert poorly - convert to character first:
df$category <- as.character(df$category)
Second, NA values become null in JSON, which may cause issues with some APIs. Use na = "string" to preserve them:
toJSON(df, na = "NA")
Third, data frames with different column types may not serialize as expected - check the output carefully.
Nested and hierarchical data
JSON APIs often return deeply nested objects. jsonlite::fromJSON() with flatten = TRUE partially flattens nested structures into dot-separated column names. For complex nesting, tidyr::unnest_wider() and tidyr::unnest_longer() progressively flatten list columns into tabular form. Each unnest_*() call flattens one level; repeat until all nesting is resolved.
Streaming large JSON files
For JSON files too large to load at once, jsonlite::stream_in() reads NDJSON (newline-delimited JSON) records incrementally. Each line must be a separate JSON object. stream_in(file("data.ndjson"), handler = function(df) { ... }) processes batches without loading everything into memory. NDJSON is common in log files and API responses that return many records.
Writing JSON
jsonlite::toJSON(df, auto_unbox = TRUE) serializes a data frame as a JSON array of objects. auto_unbox = TRUE converts length-1 vectors to scalars rather than single-element arrays. pretty = TRUE adds whitespace for human readability. Use write_json() from jsonlite to write directly to a file. For round-tripping R objects (including lists with mixed types), toJSON() with auto_unbox = FALSE preserves the structure.
jsonlite vs rjson
jsonlite is the standard R JSON library with three main functions: fromJSON() for parsing, toJSON() for serialization, and stream_in()/stream_out() for line-delimited JSON. rjson is faster for simple cases but less feature-complete. jsonlite handles automatic type conversion (JSON arrays to R vectors, JSON objects to named lists), while rjson is more literal. The rapidjsonr and yyjsonr packages are faster alternatives for performance-critical applications.
Reading and writing JSON
jsonlite::fromJSON(text) parses JSON strings or file paths to R objects. By default it simplifies arrays of objects into data frames (simplifyDataFrame = TRUE) and uniform arrays into vectors. This works well for well-structured API responses. Set simplifyDataFrame = FALSE to get nested lists instead.
jsonlite::toJSON(x, pretty = TRUE) converts R objects to JSON. auto_unbox = TRUE serializes length-1 vectors as scalars rather than arrays, important for API compatibility where {"n": 1} is expected rather than {"n": [1]}. na = "null" converts NA values to JSON null.
jsonlite::read_json(path) and jsonlite::write_json(x, path) work with files directly. jsonlite::stream_in(con) and stream_out(x, con) handle newline-delimited JSON (NDJSON), one JSON object per line — which is common for log files and large exports that cannot fit in memory as a single array.
Handling nested JSON
API responses often return deeply nested JSON. jsonlite::fromJSON() with simplification may create awkward nested data frames where some columns are themselves data frames. Inspect the structure with str(result, max.level = 3) before proceeding.
tidyr::unnest_wider() expands a list-column where each element is a named list into separate columns. tidyr::unnest_longer() expands a list-column where each element is a vector or list into multiple rows. Chain these to flatten nested structures level by level.
For the common case of a response where a data array is nested inside a wrapper object:
response <- jsonlite::fromJSON(json_text)
df <- response$data # or response[["items"]], etc.
Use purrr::pluck() for safe deep extraction: purrr::pluck(response, "data", "items") returns NULL if any level is missing, rather than erroring.
Validating JSON schema
jsonvalidate::json_validate(json_text, schema) checks whether a JSON document matches a JSON Schema definition. This is valuable for API development: validate request bodies before processing them, and validate your response objects before returning them.
JSON Schema can express required fields, field types, enum values, array constraints, and nested object shapes. A schema for a user object might require id (integer), name (string), and email (string matching an email format regex).
jsonvalidate returns a logical value and sets attributes with validation error details on failure. In production API code, return HTTP 422 with the validation errors when input fails schema validation.
Performance for large JSON files
jsonlite::fromJSON() loads the entire file into memory as a string before parsing. For large files (hundreds of MB), this can exhaust memory. stream_in(con) processes NDJSON line by line without loading the full file.
For JSON inside a zip archive, unz() opens a connection: stream_in(unz("archive.zip", "data.json")).
yyjsonr::read_json_str() and yyjsonr::read_json_file() are significantly faster than jsonlite for large JSON files because they use the yyjson C library. The RcppSimdJson::fparse() function is even faster for well-structured JSON.
For processing multiple JSON files in parallel, furrr::future_map(files, jsonlite::read_json) distributes the parsing across cores.
JSON in aPIs
When building plumber APIs, jsonlite::toJSON() serializes response objects. plumber calls this automatically when a function returns a list or data frame. #* @serializer json list(auto_unbox = TRUE) sets auto-unboxing at the route level.
For consuming REST APIs, httr2::resp_body_json() parses the response body as JSON: resp <- httr2::req_perform(req); data <- httr2::resp_body_json(resp, simplifyVector = TRUE). The simplifyVector = TRUE argument is equivalent to jsonlite’s simplifyDataFrame = TRUE.
JSON as the universal data exchange format
JSON is the dominant format for data exchange on the web. REST APIs return JSON responses. Configuration files use JSON. Log data is often newline-delimited JSON. Working in R with any web API or modern data pipeline means working with JSON. The jsonlite package provides the primary tools for reading and writing JSON in R, with well-defined mappings between R’s data structures and JSON’s types.
The mapping from JSON to R requires understanding the correspondence: JSON objects become R named lists, JSON arrays become R unnamed lists (or vectors for homogeneous arrays), JSON strings become character vectors, JSON numbers become numeric, JSON booleans become logical, and JSON null becomes NULL. Nested JSON becomes nested R lists, which can then be processed with purrr’s list manipulation tools.
Writing and validating JSON
toJSON converts R objects to JSON strings. The auto_unbox argument controls whether length-one vectors are represented as JSON arrays or as scalar values. Setting auto_unbox = TRUE is usually the right choice for creating JSON intended for API consumption, where a scalar field should be a JSON scalar, not a single-element array. The pretty argument adds indentation for human-readable output; omit it for compact output in production.
JSON Schema validation checks that a JSON document conforms to a schema defining expected structure, types, and required fields. The jsonvalidate package implements JSON Schema validation in R. Validating API responses against a schema at the boundary of your R code catches API changes that would cause downstream errors, failing loudly at the input boundary rather than producing wrong results silently.
See also
- Reading Excel Files with readxl and writexl — Importing spreadsheet data into R
- Fast Data Manipulation with data.table — High-performance data handling in R
- Working with Parquet Files using Arrow — Columnar data formats for R