Using R and Python Together with reticulate
The reticulate package bridges R and Python within a single R session, letting you call Python functions, import Python modules, and share data structures between the two languages without leaving your R environment. Many teams have years of R code and statistical models that would be costly to rewrite in Python. Rather than choosing one language or rewriting everything, reticulate lets R and Python coexist in the same project, each handling the tasks it does best. This guide covers how to set up reticulate, call Python from R, share data between languages, and avoid common pitfalls.
What is reticulate?
The reticulate package, developed by RStudio (now Posit), provides an interface to Python from R. It allows you to:
- Call Python functions directly from R
- Pass R objects to Python and vice versa
- Use Python packages within R scripts
- Execute Python code chunks in R Markdown documents
Think of it as a two-way street between the two languages.
Installation and setup
First, install reticulate from CRAN:
install.packages("reticulate")
You also need Python installed on your system. Reticulate can use your existing Python installation, or you can point it at a specific virtual environment or conda environment. This flexibility lets you isolate Python dependencies per project, preventing package conflicts between different R analyses that use different Python libraries:
# Use system Python
reticulate::use_python("/usr/bin/python")
# Or use a virtual environment
reticulate::use_virtualenv("myenv")
# Or use conda
reticulate::use_condaenv("r-reticulate")
If you are working in a project, create a renv lockfile that includes your Python dependencies. This tracks both your R packages and the Python environment path in a single reproducible snapshot, so another user who clones your repository can restore the exact setup with renv::restore():
renv::init()
renv::snapshot()
This ensures reproducibility across machines. By recording both R packages (via renv) and Python dependencies in a single lockfile, another scientist can reproduce the entire dual-language environment from scratch on a different computer.
Calling Python from R
Once configured, you can import Python modules just like R packages. import("json") loads the Python standard library’s JSON module, and the $ operator maps to Python dot notation for accessing functions and attributes:
library(reticulate)
# Import Python json module
json <- import("json")
# Use it to parse a JSON string
json_data <- json$loads('{"name": "Alice", "score": 95}')
json_data$name
# [1] "Alice"
The dollar sign operator maps to Python dot notation. This pattern works for any Python module.
Using Python packages
Install Python packages as you normally would via pip or conda, then import them in R through reticulate’s import() function. The imported Python module object exposes all of the package’s classes and functions through the $ operator, so np$array() calls numpy’s array constructor from R:
# In terminal: pip install numpy pandas
library(reticulate)
np <- import("numpy")
pd <- import("pandas")
# Create a numpy array
arr <- np$array(c(1, 2, 3, 4, 5))
arr
# array([1, 2, 3, 4, 5])
# Create a pandas DataFrame
df <- pd$DataFrame(
name = c("Alice", "Bob", "Charlie"),
score = c(95, 87, 92)
)
df
The pandas DataFrame prints to the R console with Python’s standard tabular formatting. Column names are preserved, numeric types are maintained, and you can chain Python method calls on the imported DataFrame just as you would in a Python script.
Passing data between R and Python
Reticulate handles type conversion automatically in most cases. R vectors become Python lists, data frames become pandas DataFrames, and matrices become numpy arrays.
R to Python
When R objects cross the language boundary, reticulate handles type conversion automatically. R vectors become Python lists and R data frames become pandas DataFrames — no manual marshalling required. This automatic conversion is why you can call Python functions with R objects directly: reticulate translates types behind the scenes before the call executes.
# R vector becomes Python list automatically
r_vec <- c(1, 2, 3, 4, 5)
py_sum <- import("builtins")$sum(r_vec) # [1] 15
# R data frame becomes pandas DataFrame automatically
r_df <- data.frame(x = 1:5, y = c("a", "b", "c", "d", "e"))
pd <- import("pandas")
py_df <- r_df
py_df$new_col <- py_df$x * 2
py_df
Python to r
Going the other direction, py_to_r() provides explicit control over type conversion. Python lists become R vectors, numpy arrays become R matrices, and pandas DataFrames become R tibbles. Without py_to_r(), Python objects remain as opaque Python references inside R — you can call methods on them but R functions will not see them as native R types.
np <- import("numpy")
py_array <- np$array(matrix(c(1,2,3,4), nrow = 2))
r_matrix <- py_to_r(py_array)
class(r_matrix) # [1] "matrix"
Running Python scripts
When you have standalone Python scripts that define utility functions or data processing pipelines, reticulate can run them directly from R without converting them to inline Python strings. The source_python() function executes a .py file and makes every function and variable it defines available as R objects:
reticulate::source_python("script.py")
This loads the Python file and makes its functions available in the R global environment with their original names. If the Python script defines a calculate(x) function, you call it directly in R as calculate(10) — no prefix or import statement needed. The function behaves like a native R function even though its implementation is written in Python.
For more control, use reticulate::py_run_file() or reticulate::py_run_string(), which let you run Python code without polluting the R namespace:
# Run Python code directly
reticulate::py_run_string("
import numpy as np
result = np.mean([1, 2, 3, 4, 5])
print(result)
")
# Access the result
reticulate::py$result
# [1] 3
Using Python in r markdown
R Markdown and Quarto documents can mix R and Python code chunks in the same report. Both engines use reticulate under the hood to manage the Python kernel, so objects created in one language are accessible from the other. Start by loading reticulate in an R chunk:
library(reticulate)
Then switch to a Python chunk to do data work using Python libraries. The Python code runs in a separate kernel managed by reticulate, and any objects you create — DataFrames, arrays, lists — remain available to R chunks through the py$ accessor object. Here is how you create a pandas DataFrame in a Python chunk:
import pandas as pd
df = pd.DataFrame({
"x": range(10),
"y": [i**2 for i in range(10)]
})
print(df.head())
After the Python chunk executes, the df variable lives in the Python namespace but is also accessible from R. Switching back to an R chunk, you can retrieve any Python object through the py$ prefix — py$df gives you the pandas DataFrame as an R data frame with automatic type conversion:
# Access Python objects from R
py$df
This workflow is powerful for reports that mix R and Python analysis. Python chunks execute in a Python kernel managed by reticulate, and objects from R chunks are accessible in Python via r. prefix and vice versa via py$.
Virtual environments and project management
For reproducible projects, use virtual environments:
# Create a new virtual environment with specific packages
reticulate::virtualenv_create(envname = "my-project", packages = c("numpy", "pandas"))
# Specify which environment to use
reticulate::use_virtualenv("my-project")
You can also use conda environments, which offer the same isolation benefits as virtualenvs but with access to conda’s broader package ecosystem. Conda handles non-Python dependencies (like system libraries) that pip cannot install, which matters for packages like scikit-learn that depend on compiled C extensions:
reticulate::conda_create(envname = "r-reticulate", packages = c("numpy", "pandas", "scikit-learn"))
Always document your Python dependencies in a requirements.txt or environment.yml file alongside your R project. This ensures that collaborators can set up the same Python environment even if they use a different operating system or R version. Without this file, reproducing the analysis requires guessing which Python packages and versions were used.
Common pitfalls
Python path issues
If reticulate cannot find Python, the import() function will fail with an opaque error. Specify the path explicitly to point reticulate at the correct interpreter:
reticulate::use_python("/path/to/python")
reticulate::py_config()
The py_config() function prints the current Python configuration including the interpreter path, installed packages, and the version of reticulate itself. Run it whenever you suspect the wrong Python environment is being used — it surfaces mismatches between the environment you intended to activate and the one reticulate actually found.
Object type mismatch
Not every R object converts cleanly to a Python equivalent and vice versa. S4 objects, R6 classes, and custom S3 methods do not have natural Python representations. When conversion fails or produces unexpected results, keep the object in its original language and work with it there:
# Keep as Python object
py_obj <- import("module")$function()
# Pass explicitly when needed
result <- r_to_py(r_obj)
Version conflicts
Reticulate works best with Python 3.6 and newer. Older Python versions lack features that reticulate depends on for type conversion and virtual environment management. If you encounter cryptic import errors or segmentation faults, checking the active Python version is a good first diagnostic step:
reticulate::py_version()
When to use reticulate
Use reticulate when:
- Migrating from Python to R gradually
- You need a specific Python package not available in R
- Your team has existing Python code to maintain
- You want to use both languages in a single analysis
Consider native R alternatives when:
- The task has a good R equivalent (use dplyr instead of pandas)
- Performance is critical (Python may add overhead)
- Simplicity matters (mixing languages adds complexity)
Example: combining r and Python
Here is a practical workflow that uses both languages:
library(reticulate)
library(dplyr)
# Use Python for data loading
np <- import("numpy")
# Generate sample data
data <- data.frame(
id = 1:100,
value = rnorm(100)
)
# Process in Python
py_data <- r_to_py(data)
np <- import("numpy")
py_data$normalized <- (py_data$value - np$mean(py_data$value)) / np$std(py_data$value)
# Return to R for visualization
result <- py_to_r(py_data)
# Visualize in R
plot(result$value, result$normalized,
xlab = "Original",
ylab = "Normalized",
main = "Data Processed with Python, Visualized in R")
Data type conversion between R and Python
reticulate automatically converts between R and Python data types for common types: R numeric vectors become Python lists or numpy arrays, R data frames become pandas DataFrames, and R named lists become Python dicts. The conversion happens transparently when you pass objects between the two languages.
Some conversions require attention. R NULL becomes Python None and vice versa. R’s NA (missing value) does not have a direct Python equivalent, numeric NA becomes float('nan'), and character NA becomes 'NA' (the string). When passing data with R NA values to Python code that expects Python None, add explicit handling.
For large data frames, the conversion involves copying the data between R and Python memory spaces, which can be slow. For numerical arrays, passing by reference using numpy arrays with r_to_py(x, convert = FALSE) can avoid the copy when possible. The arrow package provides a more efficient path for large data frame transfers: Arrow’s memory format is shared between R and Python without copying.
Managing Python environments
reticulate works with virtualenvs, conda environments, and the system Python. The recommended approach for reproducibility is to use a project-specific virtualenv: virtualenv_create("my_project") creates the environment, and use_virtualenv("my_project", required = TRUE) activates it for the R session. Installing Python packages with py_install(c("pandas", "scikit-learn")) into the active environment keeps dependencies isolated.
For projects checked into version control, a requirements.txt file alongside renv.lock documents both the R and Python dependencies. renv::use_python() integrates Python environment management with renv’s infrastructure, tracking the Python version and packages in the renv lockfile.
Practical use cases
The clearest use case for reticulate is accessing Python machine learning libraries that have no R equivalent, LangChain, certain transformer models, or niche scikit-learn extensions. Do the data preparation in R using dplyr and tidyverse tools, pass the processed data frame to Python for the model, and return the predictions or embeddings back to R for visualization and reporting. This hybrid workflow lets each language do what it does best without rewriting entire pipelines.
Object conversion details
The R-Python object conversion table covers most common types automatically. R numeric vectors become numpy arrays or Python lists. R character vectors become Python lists of strings. R data frames become pandas DataFrames with matching column names and dtypes. R named lists become Python dicts. R logical vectors become Python lists of booleans.
For S4 objects and R6 classes, automatic conversion does not apply, these need to be explicitly converted to a common format (data frame, list, or primitive types) before passing to Python. The same applies to S3 objects with custom print methods: reticulate passes the underlying list structure, not the formatted representation.
When Python returns None, R receives NULL. When Python returns a pandas NA or numpy NaN, R receives the corresponding NA or NaN. These conversions are consistent but worth testing explicitly for your specific data pipeline, since subtle differences in missing value handling between the two ecosystems can produce unexpected results in downstream computations.
Debugging cross-language pipelines
Debugging reticulate pipelines is harder than single-language debugging because errors can originate in either R or Python. Python stack traces are printed to the R console but may be harder to parse if you are not familiar with Python error messages. py_last_error() retrieves the last Python error in R, which is useful when an error occurred but the output was captured.
For interactive debugging, reticulate::repl_python() opens a Python REPL within the R session, where you can inspect Python objects interactively. py$object_name in R accesses any Python object by name, and r.object_name in Python accesses any R object. This bidirectional access makes it possible to inspect the state of both environments at the same point in execution.
Getting started
The most practical starting point is reticulate if you are working primarily in R. Install the package, call use_virtualenv() or use_condaenv() to point at a Python environment, and then import("pandas") to load Python modules. For cross-language pipelines, Quarto documents can contain both R and Python chunks that share objects through the reticulate bridge. For production systems, keeping R and Python in separate services that communicate over REST or message queues is more maintainable than tight language mixing.
Data science workflows
R and Python excel in different phases of a data science project. R’s strengths are exploratory data analysis with ggplot2, statistical modeling with extensive CRAN packages, and reproducible reporting with R Markdown and Quarto. Python dominates model deployment, MLOps infrastructure, and deep learning. A hybrid workflow uses each language where it is strongest: R for the analysis phase, Python for productionizing.
Interoperability tools
reticulate is the primary bridge for R and Python in the same session. Quarto supports mixed-language documents where R and Python code chunks can share objects through the reticulate bridge. For separate services, the two languages communicate over REST APIs, message queues, or shared data stores (Parquet files, Arrow IPC). rpy2 is the Python-side equivalent of reticulate, it embeds an R interpreter in a Python process.
Package ecosystems comparison
R and Python have complementary package strengths. R’s CRAN provides statistical packages (survival analysis, mixed models, time series) that Python lacks native equivalents for. Python’s ecosystem covers deep learning (PyTorch, TensorFlow), NLP (Hugging Face), and deployment infrastructure. Both have data manipulation (dplyr/pandas), visualization (ggplot2/matplotlib+seaborn), and ML frameworks (tidymodels/scikit-learn). For organizations choosing between them, the talent pool and existing codebase matter more than abstract language features.
Calling Python from R with reticulate
The reticulate package bridges R and Python within a single session. After installing it, use_python("/usr/bin/python3") or use_virtualenv("myenv") sets the Python interpreter. py_run_string("import pandas as pd") executes Python code and populates the py object, so py$pd accesses the pandas module from R.
For passing data between languages: R data frames convert automatically to pandas DataFrames when passed to Python functions, and NumPy arrays become R matrices. Lists convert to Python dicts if named, or lists if unnamed. The conversion is copy-on-modify, changes in Python do not affect the R object. Large data transfers have overhead; if you’re passing gigabytes of data repeatedly, write to a shared file (Parquet via arrow) instead.
import() loads a Python module and returns an R object wrapping it: np <- import("numpy"), then np$array(1:5) calls the numpy array constructor. source_python("script.py") runs a Python file and makes its functions available in R. This is the recommended pattern for calling Python utility functions without copying data.
Calling R from Python with rpy2
rpy2 is the Python analog to reticulate. import rpy2.robjects as ro gives access to an R interpreter embedded in the Python process. ro.r('x <- 1:10') runs R code. ro.r.mean(ro.IntVector([1, 2, 3])) calls the R mean() function with Python data.
The pandas2ri submodule converts between pandas DataFrames and R data frames: ro.pandas2ri.activate() enables automatic conversion, after which a pandas DataFrame passed to an R function is transparently converted.
For production use cases, rpy2 is most common in Jupyter notebooks where data scientists want R’s statistical modeling (mixed models, survival analysis, Bayesian packages) alongside Python’s machine learning ecosystem. A common pattern: train a model in Python (scikit-learn or torch), generate predictions, pass them to R for post-processing and visualization with ggplot2.
Interoperability via file formats
The simplest interoperability pattern avoids in-process bridges entirely. Python scripts write Parquet files with pyarrow; R reads them with the arrow package. This works across machines, across time (no shared session needed), and with any language that supports Parquet.
Arrow’s columnar format is especially efficient for this use case. arrow::write_parquet(df, "output.parquet") in R and pd.read_parquet("output.parquet") in Python. Compared to CSV, Parquet preserves column types (dates stay dates, factors become categorical), compresses well, and reads orders of magnitude faster for large files.
For real-time data exchange, a simple approach is a shared database. Both R and Python can read and write to the same SQLite or PostgreSQL database. R uses DBI; Python uses SQLAlchemy or direct psycopg2. This works naturally with existing data infrastructure and does not require any special interoperability tooling.
Choosing when to use each language
R’s strengths are statistical modeling depth (mixed effects, survival analysis, Bayesian inference, spatial statistics), ggplot2’s expressive visualization grammar, and the tidyverse’s data manipulation vocabulary. For exploratory data analysis ending in a polished report (Quarto/R Markdown), R is a natural fit.
Python’s strengths are deep learning frameworks (PyTorch, JAX), production deployment infrastructure, a larger general-purpose library ecosystem, and stronger tooling for building APIs and services. For models that need to be served at scale behind a REST API, Python is easier to deploy.
In mixed teams, a practical division is: R for statistical analysis and visualization, Python for data engineering pipelines and model serving. The two interact through shared file formats, databases, or message queues rather than in-process bridges except in Jupyter notebooks where data scientists need both.
Shared data science workflows
Many organizations have mixed R and Python teams. The practical question is not which language is better, but how to collaborate without forcing everyone to learn the other language. The key is agreeing on data exchange formats rather than in-process bridges.
Parquet files are the lingua franca. An R analyst writes arrow::write_parquet(model_results, "results.parquet"). A Python engineer reads pd.read_parquet("results.parquet"). The exchange is lossless, types are preserved, including dates, categoricals, and numerics. No encoding issues, no type coercion surprises.
For model artifacts, the vetiver package standardizes model deployment across R and Python. A model trained in R can be deployed as a REST API with vetiver::vetiver_model() and plumber. A Python consumer calls the API with standard HTTP. The languages communicate over HTTP with JSON payloads, a boundary both ecosystems understand.
Sharing code and notebooks
Quarto supports R and Python in the same document. A .qmd file can have R chunks for statistical analysis and Python chunks for machine learning. The two share data through reticulate’s object bridge: py$model_predictions in R accesses Python’s model_predictions, and r.clean_data in Python accesses R’s clean_data.
This multi-language notebook is valuable for writing comprehensive data science documentation where the most natural tool for each step differs. It is less suitable for production pipelines where each step should be independently testable and deployable.
Jupyter notebooks can run R kernels via the IRkernel package. This lets Python-centric teams run R code in their familiar notebook environment. Install with IRkernel::installspec() and select the R kernel in JupyterLab. The reticulate package is not needed in this setup — each cell runs in the selected kernel’s native environment.
Package equivalence reference
Understanding the Python equivalent of R packages (and vice versa) speeds up cross-language collaboration:
Data manipulation: dplyr → pandas (or polars for performance). Reshaping: tidyr → pandas.melt()/pivot_table(). Visualization: ggplot2 → plotnine (Python ggplot2 port) or altair. Statistical modeling: lm()/glm() → statsmodels. Machine learning: tidymodels/caret → scikit-learn. Time series: lubridate → pandas datetime. String manipulation: stringr → Python’s built-in string methods + re. Functional iteration: purrr → Python list comprehensions + itertools.
The concepts transfer; the syntax differs. A Python developer who understands pandas operations will find dplyr intuitive after adjusting to the pipe syntax and the evaluation model.
The practical reality of mixed teams
In most data organizations, both R and Python exist. Data scientists trained in academia often prefer R for statistics; software engineers and ML practitioners often prefer Python. Rather than mandating one language, the most successful teams define clear exchange points where the languages interact, and let practitioners use their preferred tool within those boundaries.
The exchange points that work reliably in practice are file-based: Parquet files exchanged through shared storage (S3, Azure Blob, GCS, or a network file system). One language writes; the other reads. The serialization format handles type preservation. No shared process, no in-memory bridge, no runtime dependency between the two language environments.
The main failure mode in mixed-language teams is when someone tries to make the two languages interact too tightly — calling R from Python in-process or vice versa — without accounting for the operational complexity this creates. Dependency conflicts, environment management, and debugging across language boundaries all increase dramatically. File-based exchange is operationally simpler.
The common ground: statistical methodology
Despite their syntactic differences, R and Python implement the same statistical methods. Ordinary least squares regression, logistic regression, random forests, gradient boosting, k-means clustering, principal component analysis — all available in both languages with similar interfaces. The mathematical outputs should agree to floating-point precision when the same algorithm is implemented.
This equivalence matters for validation. Running the same model in both languages and comparing outputs is a useful correctness check. When results differ, investigating why often reveals implementation differences (regularization defaults, handling of missing data, convergence criteria) that are worth understanding regardless of which language you ultimately use.
Cross-language validation is also useful when migrating an analysis from one language to another. Running the original and the port in parallel on the same data and verifying output agreement catches translation errors before the migration is declared complete.
Career considerations
For data scientists, proficiency in both R and Python is increasingly valuable. R’s strengths in statistical computing, bioinformatics, epidemiology, and social science research mean that many specialized analyses are more naturally expressed in R. Python’s strengths in engineering, machine learning infrastructure, and production deployment mean that R-only practitioners may be limited in certain roles.
The deepest R skills — understanding the condition system, writing performant vectorized code, using non-standard evaluation to build tidyverse-style APIs — do not translate directly to Python but represent a level of language mastery that transfers to any language. Similarly, deep Python knowledge of async programming, metaclasses, and the data model builds transferable programming skills.
The practical advice for most practitioners: learn one language deeply first, then develop functional proficiency in the other. Deep expertise in one language is more valuable than shallow knowledge of both. The concepts transfer even when the syntax does not.
Why use both languages together
The question is not which language is better but which tool fits which task. R has the deepest ecosystem for statistical computing — survey-weighted regression, mixed models, spatial statistics, and clinical trial analysis all have mature R packages with no Python equivalent that matches on breadth and depth of methods. Python dominates in deep learning, production ML infrastructure, and systems that need to integrate with general software engineering tooling.
In practice, data teams that have both skill sets produce better work than teams limited to one language. An R user who can call a Python API wrapper, or a Python developer who can read an R colleague’s mixed-effects model output, can collaborate without translation friction. The boundary between the two languages is more permeable than advocates on either side acknowledge.
The two-Environment reality
Running both languages in one workflow means managing two runtime environments. This introduces practical concerns: package installation procedures differ, dependency management tools differ (renv versus pip/conda/uv), and reproducibility practices differ. A workflow that calls R from Python or Python from R needs both environments to be correctly set up, version-pinned, and documented.
For shared analysis environments — servers, containers, CI pipelines — coordinate the R and Python setups explicitly. A Docker image that includes both runtimes, with package versions locked in both renv.lock and requirements.txt, is more reproducible than a developer laptop where both are manually installed. Test integration workflows in CI so failures surface early rather than when a colleague tries to reproduce the analysis on a fresh machine.
File exchange as the simplest integration
When real-time integration is unnecessary, file exchange is the simplest and most reliable approach. Write CSV, Parquet, or JSON from one language and read in the other. Parquet is the best format for this: it preserves column types including dates and categorical types, is compact, and both languages read it without type surprises. CSV is universally readable but loses type information — every column arrives as a string.
The Apache Arrow project provides a common in-memory format that both R and Python can use. For large datasets, writing an Arrow IPC file from R and reading it in Python is faster than writing and re-reading Parquet because there is no serialization overhead for compatible types. This matters when file exchange is in a hot loop or when dataset sizes push into the gigabyte range.
Calling R from Python
The rpy2 library provides a Python interface to an R session. After setup, you can push data from Python to R, call R functions, and retrieve results. The friction comes from type coercion: R data frames become rpy2 objects that you must explicitly convert to Pandas DataFrames, and R’s NA values need special handling because Python has no equivalent for typed missing values across all types.
A practical pattern is to write thin R wrapper functions with simple inputs and outputs — numeric vectors, data frames with standard column types — specifically designed to be called from Python. Avoid passing R-specific objects like S4 class instances or environments across the boundary. The more you standardize the interface, the less time you spend debugging type conversion edge cases.
See also
polars-in-r, Using Polars from R for high-performance data processingr-data-table, Fast data manipulation with data.tabledplyr-data-wrangling, Data wrangling with dplyrr-renv, Reproducible Environments with renvr-targets, Reproducible Pipelines with targets