Calling Python from R with reticulate
What is reticulate?
If you work with R, you probably know that R has thousands of packages for statistics, data analysis, and visualization. But Python has its own rich ecosystem—particularly for machine learning (scikit-learn, TensorFlow) and scientific computing (NumPy, Pandas). What if you could use the best of both worlds?
That’s exactly what the reticulate package does. It creates a bridge between R and Python, letting you import Python libraries, call Python functions, and pass data back and forth—all from your R code.
Think of reticulate as a bilingual assistant. You speak to it in R, and it translates your requests into Python, gets the result, and translates it back into R for you.
Installing reticulate
The package lives on CRAN, so installation is straightforward:
# Install reticulate from CRAN
install.packages("reticulate")
After installation, load it like any other package:
library(reticulate)
Importing Python modules
The most common way to use Python in R is to import a Python module. In Python, you would write import numpy. In R with reticulate, you use the import() function:
# Import Python's numpy library and give it a name we can use
np <- import("numpy")
# Now we can use numpy functions through np
arr <- np$array(c(1, 2, 3, 4, 5)) # Create a numpy array
arr$mean() # Calculate the mean: returns 3
Notice the $ syntax. Since we’re calling Python code, we use $ instead of R’s usual :: for accessing module functions.
You can import any Python library this way—pandas, os, sklearn, and so on:
# Import multiple Python libraries
pd <- import("pandas") # For dataframes
os <- import("os") # For filesystem operations
# For scikit-learn, import specific submodules
model_selection <- import("sklearn.model_selection")
datasets <- import("sklearn.datasets")
Running Python code directly
Sometimes you want to run a chunk of Python code rather than import a full module. Reticulate provides two functions for this:
# Run Python code as a string
py_run_string("
x = 10
y = 20
result = x + y
")
# Access the result in R through the py object
py$result # Returns 30
The py object lets you read and write Python variables from R. Any variable you create in Python with py_run_string() becomes accessible through py$variable_name.
You can also run Python from a file:
# Execute a Python script and make its functions available
source_python("my_functions.py")
# Call a function defined in that Python file
output <- my_python_function(42)
Understanding type conversion
One of reticulate’s most useful features is automatic type conversion. When you pass data from R to Python, reticulate converts it to the equivalent Python type. When the result comes back, it converts to R again.
Here’s how the conversion works:
| R type | Becomes in Python |
|---|---|
c(1, 2, 3) | List [1, 2, 3] |
data.frame(...) | Pandas DataFrame |
matrix(...) | NumPy array |
list(a = 1, b = 2) | Dictionary {'a': 1, 'b': 2} |
TRUE / FALSE | True / False |
NULL | None |
You can also convert explicitly when needed:
# Explicitly convert R data to Python
r_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
py_df <- r_to_py(r_df)
# Explicitly convert Python object back to R
r_object <- py_to_r(py_object)
Sometimes you might want to disable automatic conversion. Pass convert = FALSE to import():
# Import without automatic conversion
np <- import("numpy", convert = FALSE)
# Now np$array() returns a Python object, not R
arr <- np$array(c(1, 2, 3))
# Convert manually when ready
r_arr <- py_to_r(arr)
Common pitfalls for beginners
When using reticulate, there are a few quirks that trip up newcomers. Knowing about them upfront saves hours of debugging.
Indexing starts at 0, not 1
Python uses 0-based indexing, while R uses 1-based. This matters when calling Python methods:
# In R, the first element is at position 1
vec <- c(10, 20, 30)
vec[1] # Returns 10
# In Python (and reticulate), the first element is at position 0
np <- import("numpy")
py_vec <- np$array(c(10, 20, 30))
py_vec$item(0L) # Returns 10 - note the L for integer
# Direct indexing also works
py_vec[[1]] # Returns 10 (R index, converts to 0 internally)
Use the l suffix for integers
R treats all numbers as numeric (double) by default. Python distinguishes between integers and floats. When a Python function expects an integer, pass it with L:
# Create a 2x3 array first
np <- import("numpy")
arr <- np$array(c(1, 2, 3, 4, 5, 6)) # 6 elements for 2x3 reshape
# This might fail or behave unexpectedly
arr$reshape(2, 3) # Python sees 2 and 3 as floats
# This works correctly
arr$reshape(2L, 3L) # Python sees 2 and 3 as integers
The L suffix tells R to create an integer rather than a numeric value.
Single-element vectors become scalars
In R, c(5) is a vector of length 1. In Python, this often becomes a scalar (a single value). If a Python function expects a list, wrap it explicitly:
# R converts c(5) to a Python scalar, not a list
# Some Python functions will reject this
# Explicitly create a list
list(5L) # This becomes [5] in Python
Iterators can be used only once
If you create a Python iterator, consuming it once empties it:
# Create a Python iterator
py_iterator <- iter(py$range(5))
# Consume the first element
iter <- iter_next(py_iterator)
# The iterator is now exhausted
second <- iter_next(py_iterator) # Returns NULL - nothing left
Practical example: using pandas
Let’s put this together with a realistic example. Suppose you have an R data frame and want to use Pandas for some operation:
library(reticulate)
# Create an R data frame
r_df <- data.frame(
name = c("Alice", "Bob", "Carol"),
score = c(85, 92, 78)
)
# Import pandas
pd <- import("pandas")
# Convert to Pandas DataFrame (automatic conversion)
py_df <- r_to_py(r_df)
# Assign the result - sort_values returns a new DataFrame
py_df <- py_df$sort_values("score", ascending = FALSE)
print(py_df)
# name score
# 1 Bob 92
# 2 Alice 85
# 3 Carol 78
You can also filter directly using Pandas syntax:
# Filter rows where score > 80
filtered <- py_df[py_df$score > 80, ]
print(filtered)
# name score
# 1 Alice 85
# 2 Bob 92
Choosing your Python environment
By default, reticulate uses the Python in your system PATH. You can specify a different Python or conda environment:
# Use a specific Python interpreter
use_python("/usr/bin/python3")
# Use a conda environment
use_condaenv("my-environment")
# Use a virtual environment
use_virtualenv("my-venv")
This is useful when you need specific package versions or want to keep your R and Python dependencies separate.
Environment setup and management
The most reliable setup uses a project-specific Python virtual environment. reticulate::virtualenv_create("r-reticulate") creates the environment, reticulate::use_virtualenv("r-reticulate", required = TRUE) activates it, and reticulate::py_install(packages) installs Python packages into it.
For project reproducibility, add the environment name and the requirements.txt file to version control. Team members run py_install(requirements_txt = "requirements.txt") after cloning. The renv::use_python() integration stores Python environment metadata in renv.lock, though full dependency lockfile support for Python is still evolving.
Passing objects efficiently
For small objects (a few thousand rows), the automatic R-Python conversion via r_to_py() and py_to_r() is sufficient. For large data frames, use Arrow as the transfer format: reticulate::r_to_py(arrow::as_arrow_table(df)) transfers via Arrow’s zero-copy memory, avoiding serialization overhead. The Python side receives a pyarrow.Table that converts to pandas with arrow_table.to_pandas().
For numerical arrays, numpy arrays transfer by memory view when possible. np$array(r_vector) creates a numpy array backed by the R vector’s memory, modifying it in Python modifies the R object. When you need an independent copy, np$copy(np$array(r_vector)).
Debugging reticulate code
reticulate::py_last_error() retrieves the last Python exception after a failed Python call. Python stack traces print to the R console but may be truncated. Increasing verbosity with options(reticulate.python.verbose = TRUE) prints all Python output and errors.
For interactive debugging, reticulate::repl_python() opens a Python shell within the R session. Type r.df to access any R object named df. Exit with exit or Ctrl-D to return to R. This bidirectional access makes it possible to inspect both R and Python objects at the same point in execution.
Passing data between R and Python
reticulate converts common types automatically: R vectors become NumPy arrays, R data frames become pandas DataFrames, R lists become Python dicts. The conversion is eager, data is copied across the language boundary, not shared by reference. For large arrays, this copy can be expensive. Use r_to_py(x, convert = FALSE) to get a Python object that stays in Python memory until you explicitly convert it.
Managing Python environments
reticulate works with virtualenvs, conda environments, and system Python. use_virtualenv("myenv") and use_condaenv("myenv") activate environments before importing. Place environment configuration at the top of a script or in .Rprofile. For reproducibility, pin Python package versions in requirements.txt and use virtualenv_install("myenv", packages = readLines("requirements.txt")) during setup.
Debugging across languages
Python errors surface as R errors with the Python traceback embedded in the message. reticulate::py_last_error() retrieves the most recent Python exception object. When debugging, py_run_string("import traceback; traceback.print_exc()") prints the full Python stack trace. Set options(reticulate.traceback = TRUE) to include Python tracebacks automatically in all reticulate error messages.
When reticulate is the right choice
reticulate makes Python available inside an R session. The primary use cases are accessing Python packages with no R equivalent, calling trained Python ML models from R, and interoperating with Python codebases that your team maintains. For workflows that are primarily R with occasional Python calls for specific capabilities, reticulate provides a clean integration without requiring a separate Python pipeline.
For workflows that are primarily Python with R for specific analyses, the reverse integration, calling R from Python with rpy2, may be more appropriate. Choose the integration direction based on where the majority of the work happens. reticulate is easier to use when Python is the subordinate tool; rpy2 is easier when R is the subordinate tool.
Environment management
reticulate works with virtual environments and conda environments. Using a dedicated environment for R/Python integration prevents version conflicts between Python packages needed for R work and Python packages for other projects. Create the environment with reticulate::virtualenv_create() or conda, install packages into it, and call use_virtualenv() or use_condaenv() at the top of your R session to activate it.
The python_info() function shows which Python environment reticulate is using. If the wrong environment is active, packages will be missing or wrong versions will be loaded. Setting the RETICULATE_PYTHON environment variable before starting R specifies the default Python executable, which is useful in production environments where reticulate should always use a specific interpreter.
Transferring data between languages
Python objects are accessible in R as reticulate proxy objects. Numeric arrays automatically convert between NumPy arrays and R matrices. DataFrames convert between Pandas and R data frames. These conversions happen automatically when Python objects are assigned to R variables. For objects that do not have automatic conversions — custom Python classes, generators, iterators — you work with the proxy object through method calls.
The source_python() function runs a Python script and makes all top-level objects available in R. This is useful for running Python code that defines functions and classes that you then call from R. Alternatively, use py_run_string() to execute Python code written as an R character string, or import() to import Python modules. For complex integrations, writing the Python functionality as a Python module and importing it with import() is the cleanest approach.
Summary
The reticulate package opens up Python’s ecosystem to R users. You can import any Python module with import(), run inline Python with py_run_string(), and let reticulate handle the data conversion automatically. Watch out for the 0-based indexing and integer type quirks, and you’ll have a powerful combination of both languages at your fingertips.
See also
- The install.packages() function for installing R packages from CRAN
- R’s library() function for loading R packages
- R’s data.frame() function for data exchange between R and Python