Connecting R to Databases with DBI
The DBI package provides a unified interface for communicating with database management systems from R. Instead of writing database-specific code, you use DBI functions that work across MySQL, PostgreSQL, SQLite, Oracle, and other databases. This abstraction layer lets you switch between databases without rewriting your R code.
Installation and Setup
Install DBI from CRAN along with RSQLite for a demonstration database:
install.packages(c("DBI", "RSQLite"))
You also need database-specific drivers. RSQLite works out of the box for SQLite databases. For other databases, install the appropriate package:
# For PostgreSQL
install.packages("RPostgres")
# For MySQL or MariaDB
install.packages("RMySQL")
# For Oracle
install.packages("ROracle")
Creating a Database Connection
The first step is establishing a connection to your database. For SQLite, you create a file-based database:
library(DBI)
# Create an in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Or connect to a file
con <- dbConnect(RSQLite::SQLite(), "my_database.db")
For other databases, use the appropriate driver:
# PostgreSQL
con <- dbConnect(RPostgres::Postgres(),
host = "localhost",
dbname = "mydb",
user = "username",
password = "password")
# MySQL
con <- dbConnect(RMySQL::MySQL(),
host = "localhost",
dbname = "mydb",
user = "username",
password = "password")
Executing SQL Queries
Send SQL statements to the database with dbExecute() for modifying queries:
# Create a table
dbExecute(con, "
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT,
department TEXT,
salary INTEGER
)
")
# Insert data
dbExecute(con, "INSERT INTO employees (name, department, salary)
VALUES ('Alice', 'Engineering', 75000)")
dbExecute(con, "INSERT INTO employees (name, department, salary)
VALUES ('Bob', 'Marketing', 65000)")
Use dbSendStatement() for queries that don’t return results:
# Send without waiting for results
dbSendStatement(con, "DELETE FROM employees WHERE salary < 60000")
Retrieving Data
Use dbGetQuery() to execute a query and fetch results in one step:
# Fetch all employees in Engineering
engineers <- dbGetQuery(con, "
SELECT name, salary
FROM employees
WHERE department = 'Engineering'
")
print(engineers)
For large results, fetch in chunks with dbSendQuery() and dbFetch():
# Send query without fetching
result <- dbSendQuery(con, "SELECT * FROM employees")
# Fetch in batches of 100 rows
while (!dbHasCompleted(result)) {
batch <- dbFetch(result, n = 100)
# Process batch
print(nrow(batch))
}
# Always clear the result
dbClearResult(result)
Parameterized Queries
Never interpolate values directly into SQL strings. Use parameterized queries to prevent SQL injection:
# Safe parameterized query
department <- "Engineering"
salary_threshold <- 70000
results <- dbGetQuery(con,
"SELECT name, salary FROM employees
WHERE department = ? AND salary > ?",
params = list(department, salary_threshold)
)
The ? placeholders are safely escaped by the database driver. This approach protects against malicious input and handles special characters automatically.
Working with Multiple Results
Execute multiple statements and handle results sequentially:
# Multiple queries
sql <- "
SELECT COUNT(*) as total FROM employees;
SELECT AVG(salary) as avg_salary FROM employees;
"
# Execute and get first result set
result1 <- dbGetQuery(con, sql)
# For multiple result sets, use dbSendStatement
Database Metadata
Query the database schema programmatically:
# List all tables
dbListTables(con)
# List columns in a table
dbListFields(con, "employees")
# Check if a table exists
dbExistsTable(con, "employees")
These functions are essential for building dynamic applications that adapt to changing database schemas.
Transaction Management
Wrap multiple operations in a transaction for atomicity:
dbBegin(con)
tryCatch({
dbExecute(con, "INSERT INTO employees VALUES (3, 'Carol', 'Sales', 70000)")
dbExecute(con, "INSERT INTO employees VALUES (4, 'Dave', 'Sales', 68000)")
dbCommit(con)
}, error = function(e) {
dbRollback(con)
message("Transaction failed: ", e)
})
If any statement fails, dbRollback() undoes all changes. This ensures data integrity when performing multiple related operations.
Disconnecting
Always close connections when done:
dbDisconnect(con)
For temporary connections in functions, use on.exit() to ensure cleanup even if an error occurs:
connect_and_query <- function(sql) {
con <- dbConnect(RSQLite::SQLite(), ":memory:")
on.exit(dbDisconnect(con))
# Your query logic here
}
Error Handling
DBI functions return errors when something goes wrong. Handle them gracefully:
result <- tryCatch(
dbGetQuery(con, "SELECT * FROM nonexistent_table"),
error = function(e) {
message("Query failed: ", e)
NULL
}
)
Always closing connections is critical. Unclosed connections can exhaust database resources and cause connection pool exhaustion.
Performance Tips
For frequently executed queries, use prepared statements:
# Create a prepared statement
query <- dbSendQuery(con, "SELECT * FROM employees WHERE department = ?")
dbBind(query, list("Engineering"))
results <- dbFetch(query)
dbClearResult(query)
This avoids parsing the query each time, improving performance for repeated queries.
See Also
RSQLite— SQLite driver for DBIRPostgres— PostgreSQL driver for DBIdplyr— Translate dplyr syntax to SQL with dbplyr