Getting Started with Python using R and reticulate

R
Python
reticulate
Published

January 15, 2023

Want to use Python’s powerful libraries without leaving R? The reticulate package gives you the best of both worlds - R’s elegant data handling and visualization with Python’s machine learning and scientific computing tools. This post shows you how to set up and use this powerful bridge between languages.

Quick Setup in 4 Steps

1. Install reticulate

install.packages("reticulate")
library(reticulate)

2. Install Python via Miniconda

The easiest approach is to let reticulate handle Python installation for you:

install_miniconda(path = "c:/miniconda")

3. Connect to Python

Reticulate creates a default environment called r-reticulate. Let’s connect to it:

# Check available environments
conda_list()

# Connect to the default environment
use_condaenv("r-reticulate")

4. Install Python Packages

Now you can install any Python packages you need:

py_install(c("pandas", "scikit-learn", "matplotlib"))

Three Ways to Use Python in R

1. Import Python Modules Directly

# Import pandas and use it like any R package
pd <- import("pandas")

# Create a pandas Series
pd$Series(c(1, 2, 3, 4, 5))

# Import numpy for numerical operations
np <- import("numpy")
np$mean(c(1:100))  # Calculate mean using numpy

2. Write Python Code in R Markdown

You can mix R and Python code in the same document by using Python code chunks:

# This is Python code!
import pandas as pd
import numpy as np

# Create a simple DataFrame
df = pd.DataFrame({
    'A': np.random.randn(5),
    'B': np.random.randn(5)
})

print(df.describe())

3. Use Python Libraries in R Workflows

The most powerful approach is using Python’s machine learning libraries within R:

# Import scikit-learn
sk <- import("sklearn.linear_model")

# Create and fit a linear regression model
model <- sk$LinearRegression()
model$fit(X = as.matrix(mtcars[, c("disp", "hp", "wt")]), 
         y = mtcars$mpg)

# Get predictions and coefficients
predictions <- model$predict(as.matrix(mtcars[, c("disp", "hp", "wt")]))
coefficients <- data.frame(
  Feature = c("Intercept", "disp", "hp", "wt"),
  Coefficient = c(model$intercept_, model$coef_)
)

coefficients

Real-World Applications

Here are some powerful ways to combine R and Python in your data science workflow:

Data Science Pipeline

# 1. Data cleaning with R's tidyverse
library(readr)
clean_data <- read_csv("data.csv") %>%
  filter(!is.na(important_column)) %>%
  mutate(new_feature = feature1 / feature2)

# 2. Machine learning with Python's scikit-learn
sk <- import("sklearn.ensemble")
model <- sk$RandomForestClassifier(n_estimators=100)
model$fit(X = as.matrix(clean_data[, features]), 
         y = clean_data$target)

# 3. Visualization with R's ggplot2
predictions <- model$predict_proba(as.matrix(clean_data[, features]))[,2]
clean_data %>%
  mutate(prediction = predictions) %>%
  ggplot(aes(x=feature1, y=feature2, color=prediction)) +
  geom_point() +
  scale_color_viridis_c()

When to Use Each Language

Use R for:

  • Data manipulation with dplyr/data.table
  • Statistical modeling and hypothesis testing
  • Publication-quality visualization
  • Interactive reports and dashboards

Use Python for:

  • Deep learning with TensorFlow/PyTorch
  • Natural language processing
  • Computer vision
  • Advanced machine learning algorithms

With reticulate, you don’t have to choose - use the best tool for each part of your analysis!