Connecting R to Databases with DBI

· 4 min read · Updated March 11, 2026 · intermediate
r database dbi data

The DBI package provides a unified interface for communicating with database management systems from R. Instead of writing database-specific code, you use DBI functions that work across MySQL, PostgreSQL, SQLite, Oracle, and other databases. This abstraction layer lets you switch between databases without rewriting your R code.

Installation and Setup

Install DBI from CRAN along with RSQLite for a demonstration database:

install.packages(c("DBI", "RSQLite"))

You also need database-specific drivers. RSQLite works out of the box for SQLite databases. For other databases, install the appropriate package:

# For PostgreSQL
install.packages("RPostgres")

# For MySQL or MariaDB
install.packages("RMySQL")

# For Oracle
install.packages("ROracle")

Creating a Database Connection

The first step is establishing a connection to your database. For SQLite, you create a file-based database:

library(DBI)

# Create an in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")

# Or connect to a file
con <- dbConnect(RSQLite::SQLite(), "my_database.db")

For other databases, use the appropriate driver:

# PostgreSQL
con <- dbConnect(RPostgres::Postgres(), 
                 host = "localhost",
                 dbname = "mydb",
                 user = "username",
                 password = "password")

# MySQL
con <- dbConnect(RMySQL::MySQL(),
                 host = "localhost",
                 dbname = "mydb",
                 user = "username",
                 password = "password")

Executing SQL Queries

Send SQL statements to the database with dbExecute() for modifying queries:

# Create a table
dbExecute(con, "
  CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT,
    department TEXT,
    salary INTEGER
  )
")

# Insert data
dbExecute(con, "INSERT INTO employees (name, department, salary) 
                VALUES ('Alice', 'Engineering', 75000)")

dbExecute(con, "INSERT INTO employees (name, department, salary) 
                VALUES ('Bob', 'Marketing', 65000)")

Use dbSendStatement() for queries that don’t return results:

# Send without waiting for results
dbSendStatement(con, "DELETE FROM employees WHERE salary < 60000")

Retrieving Data

Use dbGetQuery() to execute a query and fetch results in one step:

# Fetch all employees in Engineering
engineers <- dbGetQuery(con, "
  SELECT name, salary 
  FROM employees 
  WHERE department = 'Engineering'
")

print(engineers)

For large results, fetch in chunks with dbSendQuery() and dbFetch():

# Send query without fetching
result <- dbSendQuery(con, "SELECT * FROM employees")

# Fetch in batches of 100 rows
while (!dbHasCompleted(result)) {
  batch <- dbFetch(result, n = 100)
  # Process batch
  print(nrow(batch))
}

# Always clear the result
dbClearResult(result)

Parameterized Queries

Never interpolate values directly into SQL strings. Use parameterized queries to prevent SQL injection:

# Safe parameterized query
department <- "Engineering"
salary_threshold <- 70000

results <- dbGetQuery(con, 
  "SELECT name, salary FROM employees 
   WHERE department = ? AND salary > ?",
  params = list(department, salary_threshold)
)

The ? placeholders are safely escaped by the database driver. This approach protects against malicious input and handles special characters automatically.

Working with Multiple Results

Execute multiple statements and handle results sequentially:

# Multiple queries
sql <- "
  SELECT COUNT(*) as total FROM employees;
  SELECT AVG(salary) as avg_salary FROM employees;
"

# Execute and get first result set
result1 <- dbGetQuery(con, sql)

# For multiple result sets, use dbSendStatement

Database Metadata

Query the database schema programmatically:

# List all tables
dbListTables(con)

# List columns in a table
dbListFields(con, "employees")

# Check if a table exists
dbExistsTable(con, "employees")

These functions are essential for building dynamic applications that adapt to changing database schemas.

Transaction Management

Wrap multiple operations in a transaction for atomicity:

dbBegin(con)

tryCatch({
  dbExecute(con, "INSERT INTO employees VALUES (3, 'Carol', 'Sales', 70000)")
  dbExecute(con, "INSERT INTO employees VALUES (4, 'Dave', 'Sales', 68000)")
  dbCommit(con)
}, error = function(e) {
  dbRollback(con)
  message("Transaction failed: ", e)
})

If any statement fails, dbRollback() undoes all changes. This ensures data integrity when performing multiple related operations.

Disconnecting

Always close connections when done:

dbDisconnect(con)

For temporary connections in functions, use on.exit() to ensure cleanup even if an error occurs:

connect_and_query <- function(sql) {
  con <- dbConnect(RSQLite::SQLite(), ":memory:")
  on.exit(dbDisconnect(con))
  
  # Your query logic here
}

Error Handling

DBI functions return errors when something goes wrong. Handle them gracefully:

result <- tryCatch(
  dbGetQuery(con, "SELECT * FROM nonexistent_table"),
  error = function(e) {
    message("Query failed: ", e)
    NULL
  }
)

Always closing connections is critical. Unclosed connections can exhaust database resources and cause connection pool exhaustion.

Performance Tips

For frequently executed queries, use prepared statements:

# Create a prepared statement
query <- dbSendQuery(con, "SELECT * FROM employees WHERE department = ?")
dbBind(query, list("Engineering"))
results <- dbFetch(query)
dbClearResult(query)

This avoids parsing the query each time, improving performance for repeated queries.

See Also

  • RSQLite — SQLite driver for DBI
  • RPostgres — PostgreSQL driver for DBI
  • dplyr — Translate dplyr syntax to SQL with dbplyr