Reading and Writing CSV Files in R

· 5 min read · Updated March 10, 2026 · beginner
csv data-import readr read.csv write.csv

CSV files are the bread and butter of data work. Almost every dataset you encounter will come as a CSV at some point, and you’ll need to export your cleaned data the same way. R gives you two main paths: base R functions that have been around forever, and the readr package from the tidyverse that handles edge cases better. This guide covers both.

Reading CSV Files in Base R

Base R includes read.csv() and read.csv2() for reading comma-separated and semicolon-separated files. These functions return a data frame, which is what you want most of the time.

# Basic reading - first row becomes column names
df <- read.csv("data/survey.csv")

# If your file has no header row
df <- read.csv("data/survey.csv", header = FALSE)

# Specify column classes upfront - saves memory and parsing time
df <- read.csv("data/survey.csv",
               colClasses = c("character", "numeric", "integer", "character"))

# Handle missing values - these strings become NA
df <- read.csv("data/survey.csv", na.strings = c("NA", "", "N/A", "-9999"))

The header argument matters more than you might think. Some datasets ship without headers, especially exports from legacy systems. Others have the first row as metadata rather than column names. Check your file before you import.

colClasses is worth using even when you don’t have to. It forces R to read each column as the type you specify, which prevents the common headache of strings accidentally being converted to factors or numbers being read as strings. This also makes reading large files noticeably faster.

Reading CSV Files with readr

The readr package (part of tidyverse) offers a modern alternative. It returns tibbles instead of data frames, handles encoding better, and gives you progress bars for large files.

library(readr)

# Basic reading - much faster than base R for large files
df <- read_csv("data/survey.csv")

# Explicit column types - recommended for reproducibility
df <- read_csv("data/survey.csv",
               col_types = cols(
                 id = col_integer(),
                 name = col_character(),
                 score = col_double(),
                 date = col_date(format = "%Y-%m-%d")
               ))

# Skip lines at the top - useful when there's metadata before the data
df <- read_csv("data/survey.csv", skip = 3)

# Read only specific columns - saves memory on wide files
df <- read_csv("data/survey.csv",
               col_select = c(id, name, score))

# Handle missing values cleanly
df <- read_csv("data/survey.csv", na = c("NA", "", "N/A", "-9999"))

The first argument is always the file path. readr can also read directly from compressed files without you needing to unzip them first.

Tibbles behave slightly differently from data frames. They don’t do partial matching on column names and they print more cleanly. Most of the time this is exactly what you want.

Writing CSV Files

Exporting data works the same way - you have base R and readr options.

# Base R - write.csv includes row names by default, usually unwanted
write.csv(df, "output/survey_cleaned.csv", row.names = FALSE)

# Specify NA representation
write.csv(df, "output/survey_cleaned.csv", 
          row.names = FALSE, 
          na = "-9999")

# readr version - cleaner output by default
library(readr)
write_csv(df, "output/survey_cleaned.csv")

# Write to a specific encoding
write_csv(df, "output/survey_cleaned.csv", 
          na = "-9999",
          quote = "needed")

Watch out for the row.names argument in base R. It adds an unnamed first column with row numbers, which breaks downstream processing. Always set it to FALSE unless you specifically need those row names.

The readr functions are more explicit about quoting behavior. They only quote when necessary, whereas base R sometimes over-quotes.

Handling Edge Cases

Real CSV files are messier than textbook examples. Here are the common problems and how to solve them.

# Files with different delimiters - readr handles these
read_csv2("data/european_sales.csv")   # semicolon delimited
read_tsv("data/tab_separated.txt")     # tab delimited
read_delim("data/mixed_delimiter.txt", delim = "|")

# Handle quoted fields with commas inside
# "Smith, John",50000 becomes two fields in some files
df <- read_csv("data/complex.csv", 
               quote = "\"",
               escape_backslash = TRUE)

# Multiple lines in a single cell - readr handles this automatically
df <- read_csv("data/has_newlines.csv")

# Character encoding problems - specify encoding explicitly
df <- read_csv("data/german_data.csv", 
               locale = locale(encoding = "ISO-8859-1"))

# Files spread across multiple lines per record
df <- read_fwf("data/fixed_width.txt",
               fwf_cols(id = 1, name = 10, value = 5),
               col_types = "icn")

The tidyverse uses locale() to control encoding, date formats, and decimal markers. If you’re reading data from different countries, you’ll run into different conventions for dates and decimals.

Performance Tips for Large Files

Reading million-row CSV files can bring R to its knees if you do it wrong. These techniques help.

# Read only first N rows - useful for quick inspection
sample <- read_csv("data/million_rows.csv", n_max = 1000)

# Increase guess_max when column types are misdetected
df <- read_csv("data/messy.csv", 
               guess_max = 10000)

# Progress bar for long reads
df <- read_csv("data/huge.csv", 
               show_col_types = TRUE)

# For truly massive files, consider data.table::fread
library(data.table)
df <- fread("data/million_rows.csv", 
            nrows = 100000,
            skip = 1000000)

The guess_max parameter controls how many rows readr looks at to guess column types. If your file has unusual data early on mixed with normal data later, increase this value.

The data.table package’s fread() function is significantly faster than both base R and readr for large files. It’s worth learning if you regularly work with datasets over 100MB. It returns a data.table, which is a different beast from data frames or tibbles.

When to Use Each Approach

Base R’s read.csv() works fine for small files and quick one-off analyses. It’s always available, which matters when you’re sharing code with people who don’t use tidyverse.

Use readr for most everyday work. The tibble output, explicit type specification, and better defaults make your code more reliable. The progress bars are genuinely useful for files over 10MB.

Switch to data.table::fread when speed matters and you’re comfortable with data.table syntax. It’s the fastest option by a significant margin for files over 50MB.

For writing, the choice matters less. write.csv() is fine for small exports. write_csv() is cleaner and faster for everything else.