read_csv
read_csv(file, col_names = TRUE, col_types = NULL, na = c("", "NA"), skip = 0, n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique", trim_ws = TRUE, progress = show_progress(), show_col_types = should_show_types()) Description
read_csv() reads a comma-separated values (CSV) file and returns the data as a tibble. It parses column types automatically using the first guess_max rows (default 1000), and never converts character columns to factors.
read_csv() is a thin wrapper around read_delim(), with delim = "," hardcoded:
install.packages("readr") # readr only
install.packages("tidyverse") # full tidyverse
Arguments
file
Path to a CSV file, URL, connection, or raw vector. Supports automatic decompression for .gz, .bz2, .xz, and .zip suffixes. Remote URLs are downloaded before parsing.
To read literal inline data, wrap the string with I():
read_csv(I("x,y\n1,2\n3,4"))
# # A tibble: 2 × 2
# x y
# <dbl> <dbl>
# 1 1 2
# 2 3 4
Pass multiple paths as a character vector to read and row-bind several files at once:
read_csv(c("file1.csv", "file2.csv"))
col_names
Either TRUE (default), FALSE, or a character vector.
TRUE— first row supplies column names.FALSE— generate namesX1, X2, ....- Character vector — use these values as column names; the first row becomes data.
read_csv(I("a,b\n1,2"), col_names = FALSE)
# # A tibble: 2 × 2
# X1 X2
# <dbl> <dbl>
# 1 1 2
col_types
Column type specification. NULL (default) infers types from the first guess_max rows. Pass a cols() specification or a string shorthand to override.
String shorthand:
| Letter | Type |
|---|---|
l | col_logical() |
i | col_integer() |
d | col_double() |
n | col_number() |
c | col_character() |
f | col_factor() (requires levels) |
D | col_date() |
T | col_datetime() |
t | col_time() |
? | col_guess() |
_ or - | skip column |
col_factor() and col_skip() are never inferred — you must specify them explicitly. col_guess() is the fallback: it tells readr to infer the type when you’ve specified other columns but want the rest auto-detected.
# String shorthand: double, character, skip
read_csv(I("x,y,z\n1,a,TRUE\n2,b,FALSE"), col_types = "dc_")
# # A tibble: 2 × 2
# x y z
# <dbl> <chr> <lgl>
# 1 1 a TRUE
# 2 2 b FALSE
# cols() specification with explicit types
read_csv(
I("x,y\n1,a\n2,b"),
col_types = cols(y = col_factor(levels = c("a", "b")))
)
# Override some columns, guess the rest
read_csv(
I("x,y,z\n1,a,TRUE\n2,b,FALSE"),
col_types = cols(x = col_double(), .default = col_guess())
)
col_select
Select which columns to read using tidyselect syntax. Supports names, numeric indexes, starts_with(), last_col(), and more.
df <- read_csv(
I("chicken,eggs_laid,weight\nFoghorn,0,2.1\nLittle,3,1.8"),
col_select = c(chicken, eggs_laid)
)
df
# # A tibble: 2 × 2
# chicken eggs_laid
# <chr> <dbl>
# 1 Foghorn 0
# 2 Little 3
Rename during selection with c(new_name = old_name, ...):
read_csv(
I("x,y\n1,a\n2,b"),
col_select = c(new_x = x, y)
)
# # A tibble: 2 × 2
# new_x y
# <dbl> <chr>
# 1 1 a
# 2 2 b
id
Supply a string to add a column recording the source file path of each record. Particularly useful when reading multiple files at once:
combined <- read_csv(c("file1.csv", "file2.csv"), id = "source")
# # A tibble: 4 × 3
# source x y
# <chr> <dbl> <dbl>
# 1 file1.csv 1 2
# 2 file1.csv 3 4
# 3 file2.csv 5 6
# 4 file2.csv 7 8
locale
Controls date format, time format, decimal mark, grouping mark, time zone, and encoding. Use locale() to customize. The default default_locale() is US-centric.
# Read a CSV with European decimal notation
read_csv(I("x\n1,5"), locale = locale(decimal_mark = ","))
# # A tibble: 1 × 1
# x
# <dbl>
# 1 1.5
# Read a file with non-UTF-8 encoding
read_csv("data.csv", locale = locale(encoding = "latin1"))
na
Character vector of strings to interpret as missing values. Default is c("", "NA"). Set character() for no missing value conversion.
read_csv(I("x\n1\nNA\n"), na = c("", "NA")) # [1] 1 NA
read_csv(I("x\n1\nNA\n"), na = character()) # [1] "1" "NA"
read_csv(I("x\n1\nN/A\n"), na = c("", "NA", "N/A")) # [1] 1 NA
trim_ws
Logical, defaults to TRUE. Strips leading and trailing whitespace from each field before parsing. Note that read_delim() defaults to FALSE — watch for this difference when switching between functions.
skip
Number of lines to skip before reading. Comment lines within the skipped range are also skipped. Default is 0.
read_csv(I("header\nx\n1\n2"), skip = 1)
# # A tibble: 2 × 1
# x
# <dbl>
# 1 1
# 2 2
n_max
Maximum number of data rows to read. Inf (default) reads all rows. Useful for previewing large files:
read_csv(I("x\n1\n2\n3\n4\n5"), n_max = 2)
# # A tibble: 2 × 1
# x
# <dbl>
# 1 1
# 2 2
Note: guess_max is capped at n_max, so type inference uses at most the rows actually read.
guess_max
Maximum rows used for type inference. Default is min(1000, n_max). Increase if early rows are unrepresentative of the full column:
# Suppose the first 1000 rows are integers, but row 1001+ are doubles
read_csv(I("x\n1\n2\n"), guess_max = 1001)
name_repair
How to handle duplicate or invalid column names. Options:
"minimal"— keep names as-is (may contain duplicates)."unique"(default) — make unique by appending...1,...2, etc."check_unique"— error if any duplicates exist."unique_quiet"— repair silently."universal"— make syntactically valid unique names.- Custom function —
function(nms) c("name1", "name2", ...)returning repaired names.
quote, comment
quote— quote character, default"\". Setquote = ""to disable quoting.comment— lines starting with this prefix are ignored. Default""means no stripping.
read_csv(I('x\n1\n# comment\n2'), comment = "#")
# # A tibble: 2 × 1
# x
# <dbl>
# 1 1
# 2 2
skip_empty_rows
Logical, defaults to TRUE. When TRUE, blank rows are skipped entirely. When FALSE, blank rows are returned as NA across all columns.
num_threads, progress
num_threads— number of threads for parallel parsing. Defaultreadr_threads(). Set to1for files containing newlines inside quoted fields.progress— display a progress bar. Defaultshow_progress(), which isFALSEin non-interactive sessions (e.g., knitting).
show_col_types
NULL(default) — print column types only when inferred (i.e., whencol_typesis not supplied).TRUE— always print column types.FALSE— never print column types.
read_csv(I("x\n1"), col_types = NULL, show_col_types = FALSE) # silent inference
read_csv(I("x\n1"), col_types = "i", show_col_types = TRUE) # shows types even though specified
lazy
Logical, default should_read_lazy(). When TRUE, uses lazy reading via vroom. Default is FALSE. Writing back to the same file while a lazy handle is open can cause problems.
Value
Returns a tibble with one column per CSV field and one row per record. Character columns are never auto-converted to factors. Row names are never set.
If there are parsing problems, a warning is printed showing the first few. Retrieve all problems with problems(df). Throw an error on any problem with stop_for_problems(df):
df <- read_csv(I("x\n1\nabc"))
# Warning: 1 parsing failure.
# ...
problems(df)
# # A tibble: 1 × 4
# row col expected actual file
# <int> <int> <chr> <chr> <chr>
# 1 2 x no_dots abc ""
stop_for_problems(df)
# Error: Parsing errors present.
Basic Usage
Read a file from disk:
df <- read_csv("data.csv")
Read a CSV from a URL:
df <- read_csv("https://example.com/data.csv")
Read multiple files, tagged with source:
combined <- read_csv(c("train.csv", "test.csv"), id = "split")
Column Type Specification
Always specify col_factor() and col_skip() explicitly — they are never inferred. Use col_guess() as the fallback when you want readr to infer the type for specific columns:
read_csv(
I("id,category,score\n1,A,3.2\n2,B,4.1"),
col_types = cols(
id = col_integer(),
category = col_factor(levels = c("A", "B", "C")),
.default = col_guess()
)
)
Handling Missing Values
Empty strings and "NA" are NA by default. Add custom values:
read_csv(I("x\n1\nN/A\nnull"), na = c("", "NA", "N/A", "null"))
Skipping and Limiting Rows
Combine skip and n_max to read a specific range:
# Skip 10 header lines, read 5 data rows
read_csv("data.csv", skip = 10, n_max = 5)
Compared to Base R
| Feature | read_csv() | read.csv() |
|---|---|---|
| Return type | tibble | data.frame |
| Strings to factors | never | default TRUE |
| Row names | never | optional |
| Type inference | automatic | limited |
| Speed | faster | slower |
| Dependencies | readr | none |
read_csv() is faster, returns a tibble, and never surprises you with factors. read.csv() requires no dependencies but has limited type inference and converts strings to factors by default.
For unusual CSV formats — those with escape backslashes, alternative quote escaping, or unusual delimiters inside quoted fields — read_delim() exposes additional arguments that read_csv() does not.
Common Problems
Type inference wrong for late-appearing values: Increase guess_max:
read_csv(I("x\n1\n"), guess_max = 2000)
“NA” in my data is being converted to logical NA: Add "NA" to na or use na = character() if empty strings are not missing values:
read_csv(I("x\nNA"), na = character()) # keeps "NA" as character
Blank rows produce all-NA rows: Set skip_empty_rows = FALSE to treat blank lines as data rows (all NA), or remove them with drop_na() after reading.
Quote handling with embedded delimiters: If a field contains a comma inside quotes, ensure the quote character is " (default) and the comma is inside the quoted region.
See Also
read_delim()— general delimiter; the underlying engine forread_csv()read_csv2()— CSV with;delimiter (European format)write_csv()— write a tibble to a CSV filestr_sub()— string extraction with tidyverse interface, follows similar design principlesfct_reorder()— reordering factor levels, another tidyverse idiom that pairs well withread_csv()for data preparation