R Databases with DBI: Connect to SQLite, PostgreSQL, MySQL
The DBI package provides a unified interface for connecting R to database management systems. Instead of writing database-specific code, you use DBI functions that work across MySQL, PostgreSQL, SQLite, Oracle, and other databases. This abstraction layer lets you switch between databases without rewriting your R code.
Installation and setup
Install DBI from CRAN along with RSQLite for a demonstration database:
install.packages(c("DBI", "RSQLite"))
DBI itself handles the query dispatch and result marshalling, but each database engine requires its own driver package that translates DBI calls into the engine’s native wire protocol. RSQLite bundles the SQLite C library directly and has no system-level dependencies, making it the fastest way to get a working database connection for prototyping and testing. For deployment against a production database server, install the backend-specific package that matches your database engine:
# For PostgreSQL
install.packages("RPostgres")
# For MySQL or MariaDB
install.packages("RMySQL")
# For Oracle
install.packages("ROracle")
These driver packages register themselves with DBI when loaded, which means once installed and imported with library(), they become available through the generic DBI functions without any additional configuration. The driver function naming follows a consistent convention: the package name followed by the driver class, such as RSQLite::SQLite() or RPostgres::Postgres(). The library() call for the driver package is optional if you fully qualify the driver constructor with the package namespace as shown in the examples above.
Creating a database connection
The first step is establishing a connection to your database. For SQLite, you create a file-based database. The dbConnect() function is the universal entry point across all backends; it accepts a driver object as its first argument and backend-specific connection parameters after that. The ":memory:" special string creates a temporary database that exists only while the R session is running and is discarded when you disconnect, making it ideal for testing queries or building reproducible examples without leaving stray files on disk:
library(DBI)
# Create an in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Or connect to a file
con <- dbConnect(RSQLite::SQLite(), "my_database.db")
When connecting to a client-server database rather than an embedded one like SQLite, you supply network-level parameters instead of a file path. These parameters tell DBI where the database server is running, which port to use, and which credentials to authenticate with. The function call structure remains identical across backends; only the driver object and the named arguments change, preserving the same API pattern whether you target a local SQLite file or a remote PostgreSQL cluster with read replicas:
# PostgreSQL
con <- dbConnect(RPostgres::Postgres(),
host = "localhost",
dbname = "mydb",
user = "username",
password = "password")
# MySQL
con <- dbConnect(RMySQL::MySQL(),
host = "localhost",
dbname = "mydb",
user = "username",
password = "password")
Notice that the only differences between the PostgreSQL and MySQL examples are the driver function calls: RPostgres::Postgres() versus RMySQL::MySQL(). Everything else, from the named parameters to the assignment syntax, follows the same pattern. In production settings, avoid hardcoding credentials directly in R scripts; retrieve them from environment variables using Sys.getenv("DB_HOST"), Sys.getenv("DB_PASSWORD"), or from a configuration file read by the config package.
Executing SQL queries
dbExecute() runs statements that modify the database (CREATE TABLE, INSERT, UPDATE, DELETE) and returns the number of rows affected rather than a result set. For long-running operations or statements where you do not need to wait for completion, dbSendStatement() submits the query asynchronously.
# Create a table and insert data with dbExecute
dbExecute(con, "
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT,
department TEXT,
salary INTEGER
)
")
dbExecute(con, "INSERT INTO employees (name, department, salary)
VALUES ('Alice', 'Engineering', 75000)")
# Asynchronous execution with dbSendStatement
dbSendStatement(con, "DELETE FROM employees WHERE salary < 60000")
Retrieving data
dbGetQuery() submits a SELECT statement and returns the full result set as an R data frame, which is convenient for small to medium results. For large queries where fetching everything at once would exhaust memory, use dbSendQuery() followed by dbFetch(n = 100) to retrieve results in chunks, then call dbClearResult() to release server-side resources.
# Single-step fetch, returns a data frame
engineers <- dbGetQuery(con, "
SELECT name, salary FROM employees
WHERE department = 'Engineering'
")
# Chunked fetch, processes large results without loading all at once
result <- dbSendQuery(con, "SELECT * FROM employees")
while (!dbHasCompleted(result)) {
batch <- dbFetch(result, n = 100)
print(nrow(batch))
}
dbClearResult(result)
The single-step dbGetQuery() approach returns a ready-to-use data frame and is appropriate when you know the result set will fit comfortably in memory, such as filtered lookups or aggregation queries that return a handful of rows. The chunked dbSendQuery() and dbFetch() pattern is essential when working with tables containing millions of rows, where loading the entire result at once would trigger an out-of-memory error. The n argument to dbFetch() controls the batch size; values between 500 and 5000 typically balance memory pressure against the overhead of repeated round-trips to the database. After processing all batches, calling dbClearResult() releases the server-side cursor and frees resources on both the R and database sides.
Parameterized queries
Never interpolate values directly into SQL strings. Use parameterized queries to prevent SQL injection. This defensive practice is not only about security; it also handles data-type coercion automatically, so you do not need to manually quote string values or format date literals for the specific database dialect you are targeting. The driver inspects the bound value’s R type and converts it to the appropriate SQL representation:
# Safe parameterized query
department <- "Engineering"
salary_threshold <- 70000
results <- dbGetQuery(con,
"SELECT name, salary FROM employees
WHERE department = ? AND salary > ?",
params = list(department, salary_threshold)
)
The ? placeholders are safely escaped by the database driver. This approach protects against malicious input and handles special characters automatically. The positional binding matches each ? to the corresponding element in the params list in order, so the first ? receives the value of department and the second receives salary_threshold. Some drivers use $1, $2 for named positional parameters, which allows reusing the same value at multiple positions without duplicating it in the params list. Whichever placeholder style you use, never splice user-controlled strings directly into the SQL text with paste() or sprintf().
Working with multiple results
Execute multiple statements and handle results sequentially. When a SQL string contains several semicolon-delimited statements, dbSendQuery() or dbGetQuery() will execute them and may return only the first result set. The exact behavior depends on the database driver and the SQL dialect, so it is safer to issue each statement as a separate call when you need to capture all intermediate results:
# Multiple queries
sql <- "
SELECT COUNT(*) as total FROM employees;
SELECT AVG(salary) as avg_salary FROM employees;
"
# Execute and get first result set
result1 <- dbGetQuery(con, sql)
# For multiple result sets, use dbSendStatement
For truly multi-statement scenarios where you need programmatic access to every result set, split the SQL into individual dbGetQuery() calls or use dbSendQuery() followed by dbFetch() in a loop. Some databases like PostgreSQL and SQL Server support stored procedures that return multiple result sets; in those cases, the driver-specific documentation will describe how to advance through them using dbNextResult() or equivalent functions. For the common pattern of counting rows and then computing an aggregate, running two separate queries is both clearer and more portable than relying on semicolons in a single string.
Database metadata
Query the database schema programmatically. These introspection functions let you write R code that adapts to the current state of the database rather than hardcoding table and column names. For example, you might iterate over dbListTables(con) to apply a transformation function to every table, or call dbExistsTable(con, name) before attempting a read to avoid a runtime error. Combined with dbplyr, you can also use dplyr verbs to inspect and manipulate schema metadata stored in INFORMATION_SCHEMA tables:
# List all tables
dbListTables(con)
# List columns in a table
dbListFields(con, "employees")
# Check if a table exists
dbExistsTable(con, "employees")
These functions are essential for building dynamic applications that adapt to changing database schemas. For deeper introspection, you can query system catalog tables directly with dbGetQuery(con, "SELECT * FROM sqlite_master") for SQLite or dbGetQuery(con, "SELECT table_name FROM information_schema.tables") for PostgreSQL. The return values from dbListTables() and dbListFields() are character vectors, making them straightforward to use in conditionals, loops, and lapply() calls. If your application needs to validate that a column exists before attempting to read it, "column_name" %in% dbListFields(con, "table") gives you a single logical test.
Transaction management
Wrap multiple operations in a transaction for atomicity. Database transactions follow the ACID contract: either all statements in the transaction succeed and are permanently written, or none of them are applied. This is critical when multiple INSERT, UPDATE, or DELETE statements form a logical unit of work. For example, transferring money between accounts requires debiting one row and crediting another; without a transaction, a crash between the two statements would leave the database in an inconsistent state. DBI provides dbBegin(), dbCommit(), and dbRollback() to manage transaction boundaries explicitly:
dbBegin(con)
tryCatch({
dbExecute(con, "INSERT INTO employees VALUES (3, 'Carol', 'Sales', 70000)")
dbExecute(con, "INSERT INTO employees VALUES (4, 'Dave', 'Sales', 68000)")
dbCommit(con)
}, error = function(e) {
dbRollback(con)
message("Transaction failed: ", e)
})
If any statement fails, dbRollback() undoes all changes. This ensures data integrity when performing multiple related operations. Most DBI drivers default to autocommit mode, meaning each statement is its own individual transaction. Calling dbBegin() switches to explicit transaction control, and you must pair it with either dbCommit() or dbRollback() before closing the connection. Leaving a transaction open blocks other operations on the same connection and may hold row-level locks that impact concurrent users, so the tryCatch pattern shown here is the recommended idiom for production code.
Disconnecting
Always close connections when done. Each open connection consumes memory and a file descriptor on the R side plus a database worker process or thread on the server side. In a script that runs once and exits, the operating system reclaims these resources when the R process terminates, but for long-running sessions, Shiny applications, or R scripts that open multiple connections in a loop, failing to disconnect causes a resource leak that eventually exhausts available connections:
dbDisconnect(con)
For temporary connections in functions, use on.exit() to ensure cleanup even if an error occurs. This pattern guarantees that the connection is closed regardless of whether the function body completes normally or is interrupted by an error, a warning, or a user-requested interruption:
connect_and_query <- function(sql) {
con <- dbConnect(RSQLite::SQLite(), ":memory:")
on.exit(dbDisconnect(con))
# Your query logic here
}
The on.exit() call must appear immediately after dbConnect() and before any code that could raise an error. The add = TRUE argument (the default is add = FALSE, which replaces any previously registered exit handler) should be specified if you are registering multiple cleanup actions within the same function. When using on.exit() inside a nested function, the cleanup runs in LIFO order so the innermost function’s handler fires first, which correctly matches the natural nesting of resource acquisition.
Error handling
DBI functions return errors when something goes wrong, such as a syntax error in the SQL, a reference to a nonexistent table, or a network timeout when the database server is unreachable. Catching these errors with tryCatch() prevents a single failed query from terminating a long-running script or an interactive session where you have other data you do not want to lose. The error object returned by DBI includes the database server’s own error message, which often provides enough context to diagnose the issue without additional logging:
result <- tryCatch(
dbGetQuery(con, "SELECT * FROM nonexistent_table"),
error = function(e) {
message("Query failed: ", e)
NULL
}
)
Always closing connections is critical. Unclosed connections can exhaust database resources and cause connection pool exhaustion. When a query fails inside tryCatch, the connection remains open and usable for subsequent queries provided the error was at the SQL level rather than a transport failure. A transport failure such as a broken TCP connection requires establishing a new connection with dbConnect() before further queries can succeed. For production scripts that run unattended, combine tryCatch with on.exit(dbDisconnect(con)) so that even error paths clean up properly.
Performance tips
For frequently executed queries, use prepared statements. A prepared statement sends the SQL to the database once for parsing and planning, then re-executes it multiple times with different parameter values. This avoids the overhead of reparsing and replanning the same query on every execution, which matters when the query runs hundreds or thousands of times with different bind values for each invocation:
# Create a prepared statement
query <- dbSendQuery(con, "SELECT * FROM employees WHERE department = ?")
dbBind(query, list("Engineering"))
results <- dbFetch(query)
dbClearResult(query)
This avoids parsing the query each time, improving performance for repeated queries. The pattern of dbSendQuery() plus dbBind() is distinct from dbGetQuery() with inline params: the former reuses the same prepared statement handle across multiple bind-and-fetch cycles, while the latter creates, executes, and tears down a new handle for each call. If you are querying the same table with varying filter values inside a loop, switch to dbSendQuery() and dbBind() to cut execution time by an order of magnitude. For bulk operations, dbWriteTable() and dbAppendTable() often outperform row-by-row INSERTs because they use the database’s native bulk-load protocol.
Connection pooling
For Shiny apps or APIs that handle concurrent requests, managing one connection per request creates connection overhead. The pool package provides a connection pool: pool <- dbPool(drv = RPostgres::Postgres(), ...) creates a pool that reuses connections. Replace dbConnect() with the pool object in dbGetQuery() calls. The pool manages checkout, checkin, and connection health automatically.
Schema and table management
dbExistsTable(con, "table_name") checks whether a table exists before creating or querying it. dbCreateTable(con, "new_table", field_types) creates an empty table with specified column types. dbWriteTable(con, "data", df, append = TRUE) appends rows to an existing table without overwriting. For schema management beyond basic DDL, use dbplyr::sql() to pass raw SQL strings: dbExecute(con, sql("CREATE INDEX idx_id ON data(id)")).
Working with multiple databases
DBI provides a unified interface across PostgreSQL (RPostgres), MySQL (RMariaDB), SQLite (RSQLite), and others. Code written with DBI functions like dbGetQuery(), dbWriteTable(), and dbExecute() works across database backends with minimal changes. The dialect differences (date functions, string operations) are the main portability concern; use dbplyr or raw SQL that targets a specific database’s SQL dialect. INFORMATION_SCHEMA queries for table and column metadata are standard SQL and work across most databases.
DBI architecture
DBI (Database Interface) is an abstract interface that separates the generic API (DBI package) from database-specific drivers. You call DBI functions; the driver translates them to the database’s native protocol. Switching databases means changing the driver and connection parameters, not the DBI code.
Common drivers: RSQLite::SQLite() (SQLite, zero-configuration), RPostgres::Postgres() (PostgreSQL), RMariaDB::MariaDB() (MySQL/MariaDB), odbc::odbc() (any ODBC datasource), duckdb::duckdb() (DuckDB, embedded analytics, see the DuckDB guide). The DBI::dbConnect() function accepts any driver.
DBI::dbDisconnect(con) closes the connection and releases resources. For long-running processes, call this when you’re done with the database. In Shiny apps, use pool::dbPool() instead of managing connections directly.
Core query operations
dbGetQuery(con, "SELECT * FROM table WHERE id = 1") executes a SELECT and returns a data frame. dbExecute(con, "INSERT INTO table VALUES (1, 'Alice')") executes DML (INSERT, UPDATE, DELETE) and returns the number of affected rows.
dbSendQuery(con, sql) sends a query without fetching results, useful for large result sets. dbFetch(rs, n = 1000) fetches the next 1000 rows. dbHasCompleted(rs) checks whether all rows have been fetched. dbClearResult(rs) frees the result set. This streaming approach processes large results without loading everything into memory.
dbWriteTable(con, "new_table", df) creates a table from a data frame, inferring column types. append = TRUE inserts into an existing table. overwrite = TRUE drops and recreates. dbAppendTable(con, "table", df) inserts rows without touching the table definition.
Safe parameter handling
dbGetQuery(con, "SELECT * FROM users WHERE name = ?", params = list(user_name)) for RSQLite, or "WHERE name = $1" with params = list(user_name) for RPostgres, these parameterized forms prevent SQL injection by separating code from data.
dbQuoteLiteral(con, value) and dbQuoteIdentifier(con, name) escape values and identifiers safely when you must construct SQL strings. Use these rather than paste() when dynamic SQL is necessary.
Schema introspection
dbListTables(con) returns all table names in the default schema. dbExistsTable(con, "tablename") checks existence. dbListFields(con, "tablename") returns column names. DBI::dbGetInfo(con) returns driver and connection metadata.
For deeper introspection: dbReadTable(con, "tablename") reads a complete table. DBI::sqlInterpolate(con, "SELECT * FROM ?table WHERE id = ?id", table = DBI::dbQuoteIdentifier(con, tbl), id = 5) safely constructs parameterized SQL strings.
Handling large data
For tables too large to fit in memory, DBI::dbSendQuery() streams rows in batches. Alternatively, use dbplyr to push computations to the database with dplyr verbs; collect() fetches only the final aggregated result.
Arrow and DBI::dbWriteTable() together transfer data between R and databases efficiently: write a Parquet file from R, then use COPY FROM (PostgreSQL) or database-specific bulk load commands for fast import. For PostgreSQL, RPostgres::postgresWriteTable() uses the COPY protocol, which is 10-50x faster than row-by-row INSERT for large tables.
DBI as the common interface
DBI defines a standard interface that all R database drivers implement. If you write code against DBI’s functions, dbConnect, dbSendQuery, dbFetch, dbWriteTable, the same code works with any database that has a DBI-compatible driver. Switching from SQLite to PostgreSQL means changing the driver package and connection arguments; the rest of the code stays the same. This portability is the main reason to use DBI rather than a database-specific package.
Each database engine has a separate driver package that implements the DBI interface: RSQLite for SQLite (see the RSQLite guide), RPostgres for PostgreSQL, RMariaDB for MySQL and MariaDB, odbc for any database with an ODBC driver. The driver package handles the low-level communication with the database; DBI provides the R functions that call into it. You load both packages but only call DBI functions in your code.
Connection management
Database connections are resources that must be closed explicitly. Calling dbConnect opens a connection; calling dbDisconnect closes it and releases any held resources. Connections left open consume database server resources and can block other operations depending on the database’s concurrency model. Use on.exit(dbDisconnect(con)) at the top of any function that opens a connection to guarantee cleanup even when the function exits due to an error.
Connection pooling is useful in applications that make many short database calls, like Shiny apps. The pool package manages a pool of connections that are borrowed for individual queries and returned to the pool afterward. Pool handles reconnecting to the database if a connection drops and limits the total number of simultaneous connections. For scripts that make a small number of queries and exit, pooling adds complexity without benefit.
Writing safe queries
Parameterized queries separate SQL structure from user-supplied values. Instead of building a query string with paste(), use dbSendQuery with a query string containing question marks as placeholders, then bind parameter values with dbBind. The database driver handles escaping the values correctly, making SQL injection impossible regardless of what values the user provides. This is not optional security hardening; it is the correct way to include any user-controlled data in a query.
dbGetQuery combines dbSendQuery, dbFetch, and dbClearResult into one call for simple queries that return a result. For queries that return large result sets, use the separate functions and fetch in batches with a row limit argument to dbFetch. This keeps memory usage bounded for queries that might return unpredictably large amounts of data.
See also
- RSQLite: SQLite driver for DBI
RPostgres: PostgreSQL driver for DBI- dplyr and dbplyr: Translate dplyr syntax to SQL with dbplyr