Fast File Import with readr
When working with data in R, reading flat files like CSVs is one of the most common tasks. The readr package provides a fast and friendly way to import data into R, handling the nitty-gritty details so you can focus on analysis.
Why use readr?
The base R functions like read.csv() have served us well for decades, but they come with some baggage. They’re relatively slow for large files, make assumptions about your data that may not hold, and their defaults don’t always align with modern data workflows.
readr addresses these issues by being:
- Faster — Uses parallel processing and parses data more efficiently
- Smarter — Automatically detects delimiters and handles type conversion
- Tidier — Returns tibbles instead of data frames, with better printing
- More controllable — Explicit column type specification when you need it
Installing and loading readr
If you haven’t already, install readr from CRAN:
install.packages("readr")
library(readr)
Reading CSV files
The most common use case is reading comma-separated values. Use read_csv() for this:
# Read a CSV file
df <- read_csv("my_data.csv")
# Print the tibble to inspect
df
When you print a readr tibble, you get a compact summary showing the column names, types, and the first few rows:
# A tibble: 1,000 × 5
name age department salary
<chr> <dbl> <chr> <dbl>
1 Alice 32 Engineering 75000
2 Bob 28 Sales 62000
3 Carol 45 Marketing 89000
This output format makes it easy to see at a glance what data you’re working with — much cleaner than the default data.frame printing in base R.
Reading TSV and other delimited files
For tab-separated values, use read_tsv():
# Read a tab-separated file
df_tsv <- read_tsv("data.tsv")
For files with other delimiters, read_delim() handles any character as a separator:
# Read a semicolon-delimited file
df_semi <- read_delim("data.txt", delim = ";")
# Read a pipe-delimited file
df_pipe <- read_delim("data.txt", delim = "|")
Specifying column types
One of readr’s superpowers is automatic type detection, but sometimes you need explicit control. The col_types argument lets you specify exactly what each column should be:
df <- read_csv("data.csv",
col_types = cols(
id = col_integer(),
name = col_character(),
value = col_double(),
flag = col_logical()
))
The available column types are:
col_integer()— Integer numberscol_double()— Floating point numberscol_character()— Stringscol_logical()— TRUE/FALSE valuescol_factor()— Categorical factorscol_date(),col_datetime(),col_time()— Date/time typescol_skip()— Don’t import this column
Handling missing values
readr automatically converts common missing value representations:
# These are all treated as NA:
# - Empty strings: ""
# - "NA" (the string)
# - "N/A", "n/a"
# - ".", "#NUM!", etc.
df <- read_csv("data.csv", na = c("", "NA", "N/A"))
The na argument lets you customize what strings should be treated as missing values.
Reading files from URLs
readr can read directly from URLs, which is handy for accessing online datasets:
# Read a CSV directly from the web
df <- read_csv("https://example.com/data.csv")
# Read from GitHub raw URLs
df <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11.csv")
Performance tips
When working with large files, these tips will speed things up:
Skip rows and limit columns
# Skip the first 10 rows (like header comments)
df <- read_csv("data.csv", skip = 10)
# Only read first 1000 rows
df <- read_csv("data.csv", n_max = 1000)
# Select specific columns
df <- read_csv("data.csv", col_select = c(id, name, value))
Specify column types upfront
Specifying types avoids the overhead of type inference:
# Faster: types known upfront
df <- read_csv("large_file.csv",
col_types = cols(.default = col_double()))
Use show_col_types = FALSE
Suppress the column type printing for cleaner output in scripts:
df <- read_csv("data.csv", show_col_types = FALSE)
Comparing with base R
Here’s a quick comparison between readr and base R functions:
| Feature | readr | Base R |
|---|---|---|
| Returns | Tibble | Data frame |
| Speed | Faster | Slower |
| Type detection | Automatic | Automatic |
| String factors | No (by default) | Yes (by default) |
| Row names | None | Optional |
The main practical difference you’ll notice is that readr never creates row names, and it doesn’t automatically convert strings to factors — both behaviors that modern R programmers generally prefer.
Summary
The readr package is your go-to for importing flat files in R. Whether you’re loading customer data, survey results, or any tabular dataset, readr makes the process painless and predictable.
Key takeaways:
- Use
read_csv()for CSVs,read_tsv()for TSVs, andread_delim()for custom delimiters - Specify
col_typeswhen you need explicit control over column interpretation - Use
nato handle missing value representations in your data - Take advantage of
skip,n_max, andcol_selectfor large files - Enjoy the tibble output and sensible defaults
With readr in your toolkit, you’ll spend less time fighting with file imports and more time doing actual analysis. The next time you need to bring data into R, reach for readr first — your future self will thank you.