Using R and Python Together with reticulate
Python has become the dominant language in data science, but many teams have invested years building tools in R. Rather than rewriting everything, the reticulate package lets you use both languages in the same project. This guide covers how to set up reticulate, call Python from R, share data between languages, and avoid common pitfalls.
What is reticulate?
The reticulate package, developed by RStudio (now Posit), provides an interface to Python from R. It allows you to:
- Call Python functions directly from R
- Pass R objects to Python and vice versa
- Use Python packages within R scripts
- Execute Python code chunks in R Markdown documents
Think of it as a two-way street between the two languages.
Installation and Setup
First, install reticulate from CRAN:
install.packages("reticulate")
You also need Python installed on your system. Reticulate can use your existing Python installation, or you can specify a particular Python environment:
# Use system Python
reticulate::use_python("/usr/bin/python")
# Or use a virtual environment
reticulate::use_virtualenv("myenv")
# Or use conda
reticulate::use_condaenv("r-reticulate")
If you are working in a project, create a renv lockfile that includes your Python dependencies:
renv::init()
renv::snapshot()
This ensures reproducibility across machines.
Calling Python from R
Once configured, you can import Python modules just like R packages:
library(reticulate)
# Import Python json module
json <- import("json")
# Use it to parse a JSON string
json_data <- json$loads('{"name": "Alice", "score": 95}')
json_data$name
# [1] "Alice"
The dollar sign operator maps to Python dot notation. This pattern works for any Python module.
Using Python Packages
Install Python packages as you normally would (via pip or conda), then import them in R:
# In terminal: pip install numpy pandas
library(reticulate)
np <- import("numpy")
pd <- import("pandas")
# Create a numpy array
arr <- np$array(c(1, 2, 3, 4, 5))
arr
# array([1, 2, 3, 4, 5])
# Create a pandas DataFrame
df <- pd$DataFrame(
name = c("Alice", "Bob", "Charlie"),
score = c(95, 87, 92)
)
df
Output:
name score
0 Alice 95
1 Bob 87
2 Charlie 92
Passing Data Between R and Python
Reticulate handles type conversion automatically in most cases. R vectors become Python lists, data frames become pandas DataFrames, and matrices become numpy arrays.
R to Python
# R vector becomes Python list
r_vec <- c(1, 2, 3, 4, 5)
py_sum <- import("builtins")$sum(r_vec)
py_sum
# [1] 15
# R data frame becomes pandas DataFrame
r_df <- data.frame(
x = 1:5,
y = c("a", "b", "c", "d", "e")
)
# Modify in Python
pd <- import("pandas")
py_df <- r_df
py_df$new_col <- py_df$x * 2
py_df
Python to R
# Create in Python, use in R
np <- import("numpy")
py_array <- np$array(matrix(c(1,2,3,4), nrow = 2))
# Convert to R
r_matrix <- py_to_r(py_array)
class(r_matrix)
# [1] "matrix"
For pandas DataFrames, use py_to_r() to convert back to an R tibble or data frame.
Running Python Scripts
You can execute entire Python scripts from R:
reticulate::source_python("script.py")
This loads the Python file and makes its functions available in R. If the Python script defines calculate(x), you call it as calculate(10) directly in R.
For more control, use reticulate::py_run_file() or reticulate::py_run_string():
# Run Python code directly
reticulate::py_run_string("
import numpy as np
result = np.mean([1, 2, 3, 4, 5])
print(result)
")
# Access the result
reticulate::py$result
# [1] 3
Using Python in R Markdown
R Markdown documents support Python chunks natively:
```{r setup}
library(reticulate)
```
```{python}
import pandas as pd
df = pd.DataFrame({
"x": range(10),
"y": [i**2 for i in range(10)]
})
print(df.head())
```
```{r}
# Access Python objects from R
py$df
```
This workflow is powerful for reports that mix R and Python analysis.
Virtual Environments and Project Management
For reproducible projects, use virtual environments:
# Create a new virtual environment with specific packages
reticulate::virtualenv_create(envname = "my-project", packages = c("numpy", "pandas"))
# Specify which environment to use
reticulate::use_virtualenv("my-project")
You can also use conda environments:
reticulate::conda_create(envname = "r-reticulate", packages = c("numpy", "pandas", "scikit-learn"))
Always document your Python dependencies in a requirements.txt or environment.yml file alongside your R project.
Common Pitfalls
Python Path Issues
If reticulate cannot find Python, specify the path explicitly:
reticulate::use_python("/path/to/python")
reticulate::py_config()
The py_config() function shows your current Python configuration.
Object Type Mismatch
Some objects do not convert cleanly. When this happens, keep objects in their original language:
# Keep as Python object
py_obj <- import("module")$function()
# Pass explicitly when needed
result <- r_to_py(r_obj)
Version Conflicts
Reticulate works best with Python 3.6+. If you encounter issues, check your Python version:
reticulate::py_version()
When to Use reticulate
Use reticulate when:
- Migrating from Python to R gradually
- You need a specific Python package not available in R
- Your team has existing Python code to maintain
- You want to use both languages in a single analysis
Consider native R alternatives when:
- The task has a good R equivalent (use dplyr instead of pandas)
- Performance is critical (Python may add overhead)
- Simplicity matters (mixing languages adds complexity)
Example: Combining R and Python
Here is a practical workflow that uses both languages:
library(reticulate)
library(dplyr)
# Use Python for data loading
np <- import("numpy")
# Generate sample data
data <- data.frame(
id = 1:100,
value = rnorm(100)
)
# Process in Python
py_data <- r_to_py(data)
np <- import("numpy")
py_data$normalized <- (py_data$value - np$mean(py_data$value)) / np$std(py_data$value)
# Return to R for visualization
result <- py_to_r(py_data)
# Visualize in R
plot(result$value, result$normalized,
xlab = "Original",
ylab = "Normalized",
main = "Data Processed with Python, Visualized in R")
See Also
polars-in-r— Using Polars from R for high-performance data processingr-data-table— Fast data manipulation with data.tabledplyr-data-wrangling— Data wrangling with dplyrr-renv— Reproducible Environments with renvr-targets— Reproducible Pipelines with targets