rguides

R and Python Interop with reticulate

The reticulate package embeds a Python session inside your R session. You can import Python modules, source .py scripts, hand data back and forth, and switch which Python interpreter reticulate binds to. Most people reach for it when they want a Python library (pandas, scikit-learn, a Hugging Face model) inside an R workflow without leaving the R Markdown document or the R package they are building.

Install from CRAN the usual way:

install.packages("reticulate")
library(reticulate)

The first time you call any reticulate function, the package picks a Python to bind. With no other configuration, it creates a virtualenv at ~/.virtualenvs/r-reticulate and installs NumPy into it. Check what it chose with py_config():

py_config()
# python:         /Users/you/.virtualenvs/r-reticulate/bin/python
# libpython:      /opt/homebrew/Cellar/python@3.12/.../libpython3.12.dylib
# pythonhome:     /Users/you/.virtualenvs/r-reticulate
# version:        3.12.4 (main, Jun  6 2024, ...)
# numpy:          /Users/you/.virtualenvs/r-reticulate/lib/.../numpy
# numpy_version:  1.26.4

Picking a Python interpreter

If the default venv is not the one you want, reticulate has three functions for pointing at a specific Python, plus two environment variables that take precedence over everything.

use_python("/usr/local/bin/python3.11")
use_virtualenv("~/projects/myapp/.venv")
use_condaenv("r-reticulate", conda = "auto")

All three take a required argument. required = TRUE (the default) makes the call a hard constraint; if the requested Python is missing, reticulate errors out. required = FALSE turns it into a soft hint, used only when no stronger signal is present.

To override the choice from outside R, set one of these environment variables before launching R:

  • RETICULATE_PYTHON: path to a specific interpreter. This is prescriptive; you cannot override it later in the same session.
  • RETICULATE_PYTHON_ENV: path to a virtualenv root.

The full discovery order is documented in the Python Version Configuration vignette. If your py_config() output complains that libpython is missing, your Python was built without --enable-shared; switch to a build that has it (conda-forge, system Python, or Homebrew’s python formula on Apple Silicon all qualify).

Calling Python from R

There are four ways to run Python code. They differ in granularity and in where the results live.

Import a module for repeated use:

os <- import("os")
os$listdir(".")
# [1] ".git" ".gitignore" "DESCRIPTION" "NAMESPACE" "R"

import() returns an R wrapper around the Python module. You access attributes and methods with $, the same way you would reach a slot on an R6 or list object. Set convert = FALSE to keep Python references on the R side; you then convert explicitly with py_to_r() when you want a native R object.

Source a script when you have a .py file with helper functions:

# helper.py
def add(x, y):
    return x + y

def double(x):
    return x * 2

Here is the R side. Once the file is sourced, the helpers show up in the calling environment as if you had defined them locally, and you can use them like any R function. The return values are converted to native R types on the way out — add(3, 12) lands as a plain R integer (15L) because the Python function returned a single scalar, not a tuple or container. If you wanted to keep the values as Python references instead, you would call source_python("helper.py", convert = FALSE) and convert explicitly with py_to_r() at the boundary:

source_python("helper.py")
add(3, 12)
# [1] 15
double(6)
# [1] 12

source_python() injects the top-level definitions into the calling environment, so the functions appear as if you had defined them in R.

Run a string for one-off snippets:

py_run_string("x = 2 ** 10")
py$x
# [1] 1024

Objects created this way live on the main Python module, reachable from R via the py environment and from Python via r. Use py_run_file() to read a file and run it the same way.

Drop into the REPL for exploratory work:

repl_python()
#> >>> import math
#> >>> math.sqrt(2)
#> 1.4142135623730951
exit

State in the REPL persists for the rest of the R session, and you can bounce back and forth between R and Python at the prompt.

Type conversion across the boundary

Reticulate’s most useful feature is that you usually do not have to think about it. Pass an R data.frame to a function expecting a pandas DataFrame; pass a pandas DataFrame back; it lands as an R data.frame.

The full table is worth memorising because the corners trip people up:

RPythonNotes
Single-element vectorScalar1 is a double in R; 1L is an int. APIs that demand a Python int need 1L.
Multi-element vectorlistc(1, 2, 3) becomes a Python list, not a tuple.
Heterogeneous unnamed listtuple
Named listdictlist(a = 1L, b = 2.0) becomes {"a": 1, "b": 2.0}.
Matrix / arrayNumPy ndarrayR to NumPy is zero-copy, Fortran-ordered. NumPy to R is a column-major copy.
data.framepandas DataFrameRow names become the pandas index and round-trip.
factorpandas categorical
POSIXtNumPy datetime64[ns]
R functionPython callable
NULL / TRUE / FALSENone / True / False

For a true Python tuple, use reticulate::tuple(); for a true Python dict, use reticulate::dict(). The function np_array(x, order = "C") builds a row-major NumPy array when you need C-ordering.

A full pandas to ggplot2 round trip

The pattern you will actually use: read data with pandas, hand it to ggplot2, send an R matrix to scikit-learn.

library(reticulate)
library(ggplot2)

pd  <- import("pandas")
url <- "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
df  <- pd$read_csv(url)
head(df)
#   sepal_length  sepal_width  petal_length  petal_width species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

ggplot(df, aes(x = sepal_length, y = sepal_width, colour = species)) +
  geom_point()

Because convert = TRUE is the default for import(), the result of pd$read_csv() is already an R data.frame. ggplot2 sees it as such; no conversion glue required.

For the other direction, send an R matrix to scikit-learn:

np <- import("numpy", convert = FALSE)
sk <- import("sklearn.linear_model", convert = FALSE)

X_r  <- matrix(c(-2, -1, 1, 2, -1, 1, -1, 1), ncol = 2, byrow = TRUE)
y    <- c(0L, 0L, 1L, 1L)
X_py <- np$array(X_r)   # zero-copy, shares memory with X_r

m <- sk$LogisticRegression()
m$fit(X_py, y)
m$predict(X_py)

With convert = FALSE, both modules return Python references. That avoids repeated conversions when you stay inside the Python side of the call. Use py_to_r(X_py) if you want the result back as a native R matrix.

Common gotchas

A few things that bite everyone eventually.

Integer vs double. 1 in R is a double. If a Python API needs an int (a list index, a column selector, n_neighbors), pass 1L. Pass 1 and you may get a TypeError or, worse, silently wrong behaviour.

0-based indexing. Python lists, NumPy arrays, and pandas objects are all 0-indexed. R is 1-indexed. Watch it on slice endpoints and on positional arguments.

Single-element vectors become scalars. If a Python function expects a list of one element, wrap it: model$predict(list(X_py)) rather than model$predict(X_py) when X_py is a single row.

convert = FALSE is your friend for performance. Do not pay for a conversion you do not need. Use import(..., convert = FALSE) and call py_to_r() only at the boundary where you actually want R data.

Data frames always copy. R matrices and arrays go to NumPy without a copy; data frames and lists do. For very large data, move to Arrow (arrow::Table round-trips zero-copy since reticulate 1.31) or hand the data to Python at the file level rather than in memory.

Generators drain. If you call iterate() on a Python generator, the generator is consumed. Iterate once.

R Markdown Python engine is on by default. If you have a reticulate install, every .Rmd document gets a Python chunk engine. Variables persist across ````{python}chunks in one render; reach them from R viapyx,andreachRvariablesfromPythonviarx`, and reach R variables from Python via `rx`.

Reticulate cannot bind to a static Python. The interpreter must be built --enable-shared. If py_config() errors with “shared library not found”, switch Python builds (conda-forge is the safe default on every platform).

For managing the R side of a multi-language project, see the renv guide — pin the reticulate version in renv.lock so the same setup works for your collaborators.

A modern alternative: py_require()

Since reticulate 1.41, you can skip the manual virtualenv dance and let reticulate resolve an ephemeral environment backed by uv:

py_require(
  packages = c("numpy>=1.26", "pandas", "scikit-learn"),
  python_version = "3.11"
)
np <- import("numpy")   # first call resolves and creates the venv

The manifest is a list on the R side; call py_require() with no arguments to inspect it, py_require(action = "set") to clear it, py_require(action = "remove", packages = "scikit-learn") to drop a package. If you prefer a project-level declaration, drop the same call into a pyproject.toml or .Rprofile so collaborators pick up the same environment without any extra setup.

This is not a replacement for a proper venv for production work, but for analysis scripts and notebooks it removes the largest source of “works on my machine” pain.

Conclusion

Reticulate is the path of least resistance between R and Python. The default virtualenv is fine for one-off work; use_python() and use_virtualenv() cover projects with a pinned interpreter; py_require() is the modern answer when you do not want to manage an env by hand. Get the type conversion table under your fingers, watch the int/double and 0/1 indexing gotchas, and the rest is ordinary R code that happens to call a Python library.

See Also

  • R and Python Together — context on when to reach for reticulate versus switching languages
  • R Keras — a Python-backed deep learning interface that uses reticulate under the hood
  • R renv — pin R dependencies alongside your reticulate configuration