Using SQLite from R

· 4 min read · Updated March 11, 2026 · beginner
r databases sqlite dbi

SQLite is a lightweight, serverless database that works perfectly for data analysis projects. R provides excellent tools for working with SQLite databases through the DBI package and its backend implementations. Unlike PostgreSQL or MySQL, SQLite requires no server setup and stores entire databases in single files, making it ideal for local development and small to medium-sized datasets.

Why SQLite for Data Analysis?

SQLite offers several advantages for R programmers. The zero-configuration setup means you can start working immediately without database administration. The file-based nature makes sharing datasets trivial. SQLite handles concurrent reads efficiently, and the database engine runs entirely within your R process, eliminating network latency.

For data analysis workflows, SQLite serves as an excellent intermediary format. You can import large CSV files once, index them for fast querying, and then perform SQL-based analysis directly from R without loading the entire dataset into memory.

Installation

Install the required packages from CRAN:

install.packages("DBI")
install.packages("RSQLite")

The DBI package provides a unified interface for database connections. This abstraction allows you to switch between different database backends (SQLite, PostgreSQL, MySQL) without changing your R code. The RSQLite package implements the SQLite driver that DBI uses to communicate with SQLite databases.

Creating a Database

Create a new SQLite database file or connect to an existing one:

library(DBI)
library(RSQLite)

# Create an in-memory database
con <- dbConnect(SQLite())

# Or create a file-based database
con <- dbConnect(SQLite(), "my_data.db")

The first approach creates a temporary in-memory database useful for testing. The second creates a persistent file on disk that survives between R sessions. In-memory databases are faster but volatile, while file-based databases persist your data between sessions.

Working with Tables

Creating Tables

Create a table from an R data frame:

# Sample data
sales <- data.frame(
  id = 1:5,
  product = c("Widget A", "Widget B", "Widget C", "Widget A", "Widget B"),
  quantity = c(10, 25, 15, 8, 30),
  price = c(19.99, 29.99, 14.99, 19.99, 29.99),
  date = as.Date(c("2026-01-15", "2026-01-16", "2026-01-17", "2026-01-18", "2026-01-19"))
)

# Write to SQLite
dbWriteTable(con, "sales", sales)

The dbWriteTable function automatically creates the table schema based on your data frame structure. It handles column types correctly, mapping R data types to appropriate SQLite types.

Reading Tables

Retrieve data from SQLite into R:

# Read entire table
sales_data <- dbReadTable(con, "sales")

# Execute custom queries
result <- dbGetQuery(con, "SELECT product, SUM(quantity) as total_qty 
                      FROM sales 
                      GROUP BY product")

The dbGetQuery function combines query execution and result fetching in one step. For more control, use dbSendQuery followed by dbFetch.

Querying with Parameters

Use parameterized queries to prevent SQL injection:

# Parameterized query
stmt <- dbSendQuery(con, "SELECT * FROM sales WHERE quantity > ?")
dbBind(stmt, list(20))
result <- dbFetch(stmt)
dbClearResult(stmt)

This approach separates the query structure from user data, improving security. Parameterized queries are essential when incorporating user input into your queries.

Using dbplyr with SQLite

The dbplyr package translates dplyr syntax to SQL automatically, letting you work with SQLite using familiar tidyverse functions:

library(dbplyr)

# Create a table reference
sales_tbl <- tbl(con, "sales")

# Use dplyr verbs
library(dplyr)

sales_summary <- sales_tbl %>%
  group_by(product) %>%
  summarise(
    total_quantity = sum(quantity),
    total_revenue = sum(quantity * price),
    .groups = "drop"
  ) %>%
  arrange(desc(total_revenue))

The query translates to SQL but executes lazily until you collect the results. This lazy evaluation means most data transformations happen in the database, reducing data transfer:

# Force execution and bring to R
sales_summary <- sales_summary %>% collect()

Database Management

Checking Table Existence

# List all tables
dbListTables(con)

# Check if table exists
dbExistsTable(con, "sales")

Removing Tables

dbRemoveTable(con, "sales")

Closing Connections

Always close database connections when done:

dbDisconnect(con)

Using on.exit() ensures the connection closes even if an error occurs:

con <- dbConnect(SQLite(), "my_data.db")
on.exit(dbDisconnect(con))
# Your database operations here

This pattern prevents connection leaks that can exhaust system resources.

Error Handling

Handle database errors gracefully:

tryCatch({
  dbExecute(con, "INSERT INTO sales VALUES (6, 'Widget D', 5, 9.99, '2026-01-20')")
}, error = function(e) {
  message("Error: ", e$message)
})

Proper error handling ensures your scripts continue running even when database operations fail.

Performance Tips

For large datasets, consider these optimizations:

  1. Create indexes on columns used frequently in WHERE clauses:

    dbExecute(con, "CREATE INDEX idx_product ON sales(product)")
  2. Use transactions for bulk operations:

    dbBegin(con)
    # Multiple operations
    dbCommit(con)
    # Or rollback on error:
    # dbRollback(con)
  3. Collect data only when needed with dbplyr to leverage database-side computation. Bringing data into R prematurely defeats the purpose of using a database.

See Also