Importing Data with readr
Introduction
Getting data into R is the first step of almost every analysis. The readr package, part of the tidyverse, makes this straightforward. It reads flat files like CSV and TSV, but it behaves differently from base R functions in ways that will save you from silent bugs.
The biggest difference is how readr handles strings. Base R’s read.csv() quietly converts character vectors to factors. readr never does this — strings stay as character vectors unless you explicitly ask for factors. readr also returns a tibble, not a data.frame, and it reports parsing problems instead of silently dropping bad values.
Reading CSV Files with read_csv()
The read_csv() function is your main tool for comma-separated values.
library(readr)
df <- read_csv("data/surveys.csv")
read_csv() assumes the first row contains column names. If your file has no header, pass col_names = FALSE:
df <- read_csv("data/no_header.csv", col_names = FALSE)
You can also provide your own column names as a character vector:
df <- read_csv("data/no_header.csv",
col_names = c("id", "species", "weight", "year"))
Controlling What Gets Imported
The skip argument tells readr to skip lines at the top of the file. This is useful when files have metadata rows before the actual data:
# Skip the first 3 lines before reading
df <- read_csv("data/surveys.csv", skip = 3)
Use n_max to read only a subset of rows — helpful for previewing a large file without loading all of it:
# Read only the first 100 rows
df <- read_csv("data/large_survey.csv", n_max = 100)
Combining skip and n_max gives you fine-grained control over which rows to import.
Specifying Column Types
By default, readr guesses each column’s type from the first 1000 rows. This works well, but sometimes you need to be explicit. The col_types argument takes a call to cols() with named type functions.
df <- read_csv("data/surveys.csv",
col_types = cols(
record_id = col_integer(),
weight = col_double(),
species_id = col_character()
)
)
readr provides a type function for every common data type:
| Function | Result |
|---|---|
col_character() | character vector |
col_double() | numeric double |
col_integer() | integer |
col_factor() | factor with defined levels |
col_date() | Date (calendar date only) |
col_datetime() | POSIXct (date and time) |
col_logical() | logical (TRUE/FALSE/NA) |
col_skip() | ignore this column |
col_guess() | let readr guess (default) |
If you only need a few columns, use cols_only() instead of listing every column you want to skip:
df <- read_csv("data/surveys.csv",
col_types = cols_only(
species_id = col_character(),
weight = col_double()
)
)
This is more efficient than reading everything and then selecting columns with dplyr.
Factors Require Levels
One thing trips up people coming from base R: col_factor() requires you to specify levels upfront. This is intentional — it makes your import reproducible.
df <- read_csv("data/surveys.csv",
col_types = cols(
plot_id = col_factor(levels = c("1", "2", "3", "4", "5")),
species_id = col_factor(levels = c("DM", "DO", "PP", "SH"))
)
)
If you want factors but do not know the levels in advance, import as character first and convert with as.factor() after.
Reading Other Delimited Files
Not every data file uses a comma as the separator. readr gives you two dedicated functions and one general-purpose tool.
For tab-separated values, use read_tsv():
df <- read_tsv("data/results.tsv")
It has the same interface as read_csv() — you get col_names, col_types, skip, n_max, and everything else.
For anything else — semicolons, pipes, tabs in custom positions — use read_delim() with an explicit delim argument:
# Semicolon-delimited (common in European data)
df <- read_delim("data/european.csv", delim = ";")
# Pipe-delimited
df <- read_delim("data/records.txt", delim = "|")
Handling Locale Differences
Different locales represent dates, decimals, and thousands separators differently. The locale() function bundles these settings together.
European spreadsheets often use a comma as the decimal separator and semicolon as the delimiter:
df <- read_delim("data/european.csv",
delim = ";",
locale = locale(decimal_mark = ",")
)
If your file uses a different encoding, specify it with the encoding argument:
df <- read_csv("data/latin1.csv",
locale = locale(encoding = "ISO-8859-1"))
To find the right encoding, try locale(encoding = "UTF-8") first (the default), then fall back to common legacy encodings like "ISO-8859-1" or "Windows-1252".
Fixed-Width Format Files
Some data files have no delimiters at all — columns are defined by their character positions. readr handles these with read_fwf().
You describe the layout using either fwf_widths() (give each column a width) or fwf_positions() (give each column a start and end position):
# Define by column widths: ID is chars 1-5, Name is 6-25, Score is 26-30
fwf_spec <- fwf_widths(
widths = c(5, 20, 5),
col_names = c("id", "name", "score")
)
df <- read_fwf("data/fixed.txt", fwf_spec)
# Same layout using start and end positions
fwf_spec <- fwf_positions(
start = c(1, 6, 26),
end = c(5, 25, 30),
col_names = c("id", "name", "score")
)
df <- read_fwf("data/fixed.txt", fwf_spec)
Both approaches produce the same result. Use whichever matches how your file format is documented.
Debugging Imports
When import goes wrong, readr gives you tools to find out what happened.
Call spec() on a file path to see what column specification readr would use, without fully importing the data:
spec("data/surveys.csv")
# A col_spec:
# cols(
# record_id = col_double(),
# month = col_double(),
# day = col_double(),
# year = col_integer(),
# plot_id = col_integer(),
# species_id = col_character(),
# ...
# )
After importing, call problems() to see rows that readr could not parse:
df <- read_csv("data/messy.csv")
problems(df)
Each problem records the row number, the column, what readr expected, and the actual value it found. This is much better than base R, where a parsing failure silently converts the value to NA.
Reading Lines Directly
Sometimes you do not want a table at all — you want the raw lines. read_lines() returns a character vector where each element is one line from the file:
lines <- read_lines("data/raw.txt", n_max = 10)
Use n_max = -1 to read all lines. read_lines_raw() does the same but returns raw byte vectors instead of character strings, which is useful for binary-safe processing.
Whitespace and Progress Bars
By default, read_csv() does not strip leading or trailing whitespace from values. If your data has messy padding, add trim_ws = TRUE:
df <- read_csv("data/messy.csv", trim_ws = TRUE)
For large files, readr shows a progress bar by default once the file exceeds about 1 MB. Suppress it explicitly if you are running in a non-interactive environment:
df <- read_csv("data/large.csv", show_progress = FALSE)
See Also
- Data Frames and Tibbles — understand what readr actually returns
- Introduction to the Tidyverse — readr in the context of the tidyverse ecosystem
- Importing and Exporting Data — broader I/O options beyond flat text files
Written
- File: sites/rguides/src/content/tutorials/readr-importing-data.md
- Words: ~1100
- Read time: 5 min
- Topics covered: read_csv, read_tsv, read_delim, read_fwf, col_types, cols, cols_only, col_factor, locale, spec, problems, read_lines
- Verified via: readr package documentation (tidyverse.org), RStudio readr reference
- Unverified items: none