Importing and Exporting Data in R
Working with data means getting it in and out of R. Whether you’re loading a dataset from a colleague or saving your analysis results, R provides multiple ways to handle file I/O. This tutorial covers the most common approaches, from base R functions to the tidyverse tools that make data import faster and more flexible.
Reading CSV Files
The CSV (Comma-Separated Values) format is the workhorse of data sharing. It’s plain text, universal, and supported by every data tool. R handles CSVs natively through read.csv(), but the tidyverse offers a faster alternative through the readr package.
Base R: read.csv()
Base R’s read.csv() is part of the standard installation. It works well for simple cases:
# Basic CSV import
my_data <- read.csv("data/survey.csv")
# Check the structure
head(my_data)
str(my_data)
The function automatically treats the first row as headers and infers column types. You can customize behavior with parameters like sep (separator), header, and na.strings:
# Handle missing values and different separators
my_data <- read.csv(
"data/survey.csv",
na.strings = c("NA", "", "missing"),
stringsAsFactors = FALSE
)
readr: read_csv()
The readr package (part of tidyverse) offers read_csv() with better defaults and faster performance for large files:
library(readr)
# Fast CSV import with explicit types
my_data <- read_csv(
"data/survey.csv",
col_types = cols(
id = col_integer(),
name = col_character(),
score = col_double(),
group = col_factor()
)
)
Key advantages of readr:
- Doesn’t convert strings to factors by default
- Shows a progress bar for large files
- More predictable behavior with missing values
- Faster on files over a few megabytes
Reading Excel Files
Excel files (.xlsx and .xls) require additional packages. The readxl package is the tidyverse solution:
library(readxl)
# Read an Excel sheet by name or number
excel_data <- read_excel("data/results.xlsx", sheet = "Q1 Results")
excel_data <- read_excel("data/results.xlsx", sheet = 2)
# List sheets in a workbook
excel_sheets("data/results.xlsx")
For more complex Excel files with multiple sheets or specific ranges:
# Read a specific range
excel_data <- read_excel(
"data/results.xlsx",
sheet = "Q1 Results",
range = "A1:D100"
)
# Handle both .xls and .xlsx
excel_data <- read_excel("data/old_format.xls")
Exporting Data
Saving data follows similar patterns. Base R provides write.csv(), while readr offers write_csv().
Writing CSV Files
# Base R
write.csv(
my_data,
"output/cleaned_data.csv",
row.names = FALSE
)
# readr (faster for large data)
library(readr)
write_csv(my_data, "output/cleaned_data.csv")
The row.names = FALSE parameter is essential in base R—it prevents R from writing row numbers as an extra column.
Exporting to Excel
library(writexl)
# Save as Excel
write_xlsx(my_data, "output/cleaned_data.xlsx")
# Multiple sheets in one workbook
write_xlsx(
list(
"Q1 Results" = q1_data,
"Q2 Results" = q2_data
),
"output/quarterly_results.xlsx"
)
Other Common Formats
JSON
library(jsonlite)
# Read JSON
json_data <- fromJSON("data/api_response.json")
# Write JSON
toJSON(my_data, pretty = TRUE, file = "output/data.json")
Delimited Files (TSV, etc.)
# Tab-separated values
read_tsv("data/file.tsv")
write_tsv(my_data, "output/file.tsv")
# Any delimiter
read_delim("data/file.txt", delim = "|")
Best Practices
-
Use
stringsAsFactors = FALSEin base R functions to avoid automatic factor conversion—it causes surprises more often than it helps. -
Check your working directory with
getwd()and use relative paths. Move files or set paths explicitly when needed. -
Handle encodings when working with international text. Use
encoding = "UTF-8"orencoding = "latin1"when special characters misbehave. -
Consider column types upfront with readr’s
col_types. It prevents type surprises later and catches data problems early. -
Name your files descriptively:
analysis_2026_03.RDatabeatsdata1.rdata.
Summary
R’s data import ecosystem covers every common format. For most tasks, read.csv() gets you started quickly, while readr functions offer speed and control for production workflows. The key is choosing the right tool: base R for quick exploration, readr for reproducibility and large files, and specialized packages (readxl, jsonlite) for domain-specific formats.
With these tools, you can pull in data from anywhere and save your results in any format your collaborators need.