rguides

Reading Excel Files with readxl and writexl

If you work with data in R, at some point you will need to read Excel files. Excel remains the default format for sharing data in many industries, and the readxl package makes it straightforward to import that data without external dependencies.

Install readxl and writexl

Both packages come from the tidyverse ecosystem and install with a single command:

install.packages("readxl")
install.packages("writexl")

No Java, no system dependencies. Just load the packages and start working:

library(readxl)
library(writexl)

Read Excel files

The main function you will use is read_excel(). It automatically detects the file format (.xls or .xlsx) and handles both legacy and modern Excel files.

# Basic usage - reads first sheet by default
df <- read_excel("data.xlsx")

Reading different sheets

Excel workbooks often contain multiple sheets. You can specify which sheet to read by name or position:

# By sheet name
df <- read_excel("data.xlsx", sheet = "Sales")

# By sheet position (1 = first sheet)
df <- read_excel("data.xlsx", sheet = 1)

Reading cell ranges

Sometimes you only need a portion of a worksheet. Use the range argument with Excel notation:

# Read cells A1:D10
df <- read_excel("data.xlsx", range = "A1:D10")

# Read from a named range
df <- read_excel("data.xlsx", range = "Sheet1!B2:F20")

Controlling column types

By default, readxl guesses column types from the data. You can override this with the col_types argument:

# Specify all columns as character
df <- read_excel("data.xlsx", col_types = "text")

# Specify per-column types
df <- read_excel("data.xlsx", 
                 col_types = c("text", "numeric", "date", "skip"))

Available types: “logical”, “numeric”, “date”, “text”, and “skip”.

Handling headers

The first row is treated as column names by default. Turn this off to get raw data:

# No header row
df <- read_excel("data.xlsx", col_names = FALSE)

# Use custom names
df <- read_excel("data.xlsx", 
                 col_names = c("ID", "Name", "Value", "Notes"))

Write Excel files

The writexl package provides write_xlsx() for creating Excel files:

# Write a data frame to Excel
write_xlsx(df, "output.xlsx")

Writing multiple sheets

Create a named list of data frames to write multiple sheets:

# Multiple sheets in one file
sheets <- list(
  "Sales" = sales_df,
  "Inventory" = inventory_df,
  "Summary" = summary_df
)

write_xlsx(sheets, "report.xlsx")

Working with named regions

Named regions make your code more readable. Reference them directly:

# Read a named region
df <- read_excel("data.xlsx", range = "MyNamedRange")

Listing sheet contents

Before reading, inspect what is available in a workbook:

# Get sheet names
excel_sheets("data.xlsx")

# Returns: [1] "Sheet1" "Sheet2" "Q1 Data"

Best practices and common pitfalls

Watch for empty cells: Excel cells can be truly empty, blank strings, or NA. readxl represents these as NA by default.

Date handling: Dates in Excel are stored as numbers (days since 1899-12-30). readxl converts these automatically, but verify the results for your specific locale.

Avoid formula cells: read_excel() reads the calculated values, not the formulas. If you need formulas, use the openxlsx package instead.

Large files: For files with thousands of rows, consider reading only what you need using the range argument to reduce memory usage.

Character encoding: If you see strange characters, the file may use a different encoding. Check the source Excel file encoding settings.

Handling multiple sheets

readxl::excel_sheets("file.xlsx") returns a character vector of sheet names. To read all sheets into a named list, use purrr::map(sheets, ~ read_excel("file.xlsx", sheet = .x)) |> setNames(sheets). bind_rows(sheet_list, .id = "sheet") combines them into one data frame with a column identifying the source sheet.

Data cleaning after import

Excel files often contain merged cells, metadata rows above the actual data, and mixed data types in a single column. skip and n_max in read_excel() control which rows to read. Pass col_names = FALSE and use janitor::row_to_names() to promote a non-first row to the header. janitor::clean_names() converts Excel column names to snake_case. openxlsx::read.xlsx() is an alternative with more options for formatting-aware reading.

Writing formatted Excel files

openxlsx2 (the successor to openxlsx) creates formatted Excel files from R: set column widths, apply number formats, add conditional formatting, and create multiple sheets. For production reporting pipelines that deliver Excel files to non-R users, openxlsx2 gives more control than writexl, which writes plain data without formatting options.

Range and cell selection

read_excel("file.xlsx", range = "B2:G20") reads an exact cell range — useful when the table is embedded in a larger sheet with metadata around it. cell_cols("B:G") and cell_rows(2:20) are helper functions for range specification. For sheets with multiple distinct tables, read each range separately and combine. excel_format("file.xlsx") returns "xls" or "xlsx" — the format affects which underlying reader is used.

Reading Excel files with readxl

readxl::read_excel("file.xlsx") reads the first sheet of an Excel file into a tibble. sheet = "Sheet2" or sheet = 2 selects by name or index. range = "A1:D100" reads a specific cell range. skip = 3 skips the first 3 rows. n_max = 1000 limits to 1000 rows.

readxl::excel_sheets("file.xlsx") returns the sheet names as a character vector. To read all sheets: purrr::map(excel_sheets("file.xlsx"), ~ read_excel("file.xlsx", sheet = .x)) returns a list of tibbles. purrr::map_dfr() binds them into one data frame, useful when all sheets have the same structure.

Column types are guessed from the first few rows. Override with col_types: col_types = c("text", "numeric", "date", "skip") explicitly sets types for each column in order, and "skip" drops a column. Mixed-type columns (cells containing numbers and text) cause issues — col_types = "text" reads all columns as character for manual cleaning.

Named ranges and data validation

readxl::read_excel("file.xlsx", range = "myrange") reads a named range. Named ranges are defined in Excel (Formulas > Name Manager). This is more reliable than cell addresses because named ranges are stable even if rows or columns are added.

Excel files sometimes contain multiple data tables on the same sheet with blank rows separating them. Read the full range first, then filter or split in R. dplyr::filter(!is.na(key_column)) removes blank rows.

Reading with openxlsx2

openxlsx2 reads and writes Excel files with more control than readxl. wb_load("file.xlsx") loads a workbook. wb_read(wb, sheet = 1) reads a sheet. Unlike readxl, openxlsx2 also writes Excel files, making it a one-package solution for reading and writing.

openxlsx2::wb_add_worksheet(), wb_add_data(), wb_add_chart_sheet() build workbooks programmatically. wb_set_col_widths(), wb_set_row_heights(), wb_add_cell_style() format cells. This is the package for generating formatted Excel reports, not just reading data.

The writexl package

writexl::write_xlsx(df, "output.xlsx") writes a data frame to an Excel file. write_xlsx(list(Sheet1 = df1, Sheet2 = df2), "output.xlsx") writes multiple sheets. This package has no Java or Perl dependencies (unlike xlsx), making it the easiest option for writing simple Excel files.

Column types are preserved: numeric columns become Excel numbers, character columns become text, Date and POSIXct columns become Excel dates (displayed with date formatting). The date cells display correctly in Excel without additional formatting.

Large Excel files

For Excel files with hundreds of thousands of rows, readxl reads the entire file into memory at once. When memory is limited, read in chunks if the data can be split by sheet. For files that are fundamentally too large for Excel (Excel’s limit is about 1 million rows), convert to CSV or Parquet first with a tool like LibreOffice command-line or Python’s openpyxl.

tidyxl::xlsx_cells("file.xlsx") reads an Excel file at the cell level — one row per cell with position, type, and value. This handles irregular Excel layouts (merged cells, multiple tables on one sheet, metadata in headers) that confound rectangular reading. After reading cells, reconstruct the structure with custom logic.

unpivotr works alongside tidyxl to extract data from irregularly structured Excel files. It provides functions to identify header rows and data blocks, making it possible to read financial statements, survey exports, and other non-rectangular spreadsheets.

Sheet selection and ranges

Excel workbooks often contain multiple sheets. readxl::excel_sheets() returns the sheet names, which you can use to read all sheets with lapply() or purrr::map(). Specifying a sheet by name is more reliable than by index because sheet order can change when the workbook is modified.

Reading only part of a sheet is useful when the data is surrounded by titles, footnotes, or other non-data content. The range argument accepts an Excel cell range like “B3:F50” or a cell_rows() / cell_cols() specification. Using skip to skip header rows and n_max to limit rows read are simpler alternatives when the data starts from the top of a sheet but has a predictable structure.

Summary

The readxl package handles most Excel import scenarios without configuration. The writexl package covers the essential export needs. Together, they form a lightweight, dependency-free solution for working with Excel files in R.

See also