If you are like me and have been using
R for a long time but would like to explore and add some python capabilities to your workflows, reticulate + R-Studio is a great way to achieve just that.
Here’s a quick excerpt from reticulate’s website:
The reticulate package provides a comprehensive set of tools for interoperability between Python and R.
Essentially, reticulate allows R to talk to Python (via a live python session running in the background) and works seamlessly within RStudio. It also provides functionality to manage multiple python installations. In this post, we’ll explore how to set up a python environment and configure the same to work with RStudio in windows. Let’s dive in!
The first step, ofcourse, is to install the
Next, we will install python via reticulate. For windows, if installing python via
reticulate, it’s better to install
miniconda since reticulate only uses
conda to install and manage python libraries. If you’d rather use a bare bones installation of python, it would be a lot easier to directly install python using the windows binary, go through the installation process, use
pip from either the command prompt or Power Shell to install the libraries you need and then point reticulate to that installation of python using
use_python(). See here for more details.
This post will use Miniconda. By default, a virtual environment called
r-reticulate will also be created as part of the installation process.
library(reticulate) install_miniconda(path = "e:/miniconda", update = T)
Pointing to the right python installation
Now that Miniconda is installed, we need to point reticulate to it. First, let’s check if the
r-reticulate virtual environment is available.
conda_list(conda = "e:/miniconda/_conda.exe") ## name python ## 1 r-reticulate E:\\miniconda\\envs\\r-reticulate\\python.exe
If available, then point to it using
use_condaenv(). This binds the particular installation of Python to the current R session.
Note that if you restart R, this will need to be set again. To set it permanently set the
RETICULATE_PYTHON environment variable using
use_condaenv(condaenv = "r-reticulate", conda = "e:/miniconda/_conda.exe")
If the default virtual environment is not available, or you would like to create a new one, then use
conda_create(envname = "myenv", conda = "e:/miniconda/_conda.exe")
reticulate nows where to find python, we can install some python libraries to work with.
py_install(packages = c("pandas", "scikit-learn"))
Using Python within R-Studio
If all of the above steps worked without any errors, you should be able to do something like this in a new R session (console or R-script).
pd <- import("pandas") pd$array(c(1, 2, 3)) ## <PandasArray> ## [1.0, 2.0, 3.0] ## Length: 3, dtype: float64
Yay! Python and R are now talking to each other 👏
Now, that everything is setup, there are multiple ways to use python along with R inside RStudio:
One can use R-Studio as an IDE for python. Simply open up a new python script from
File -> New File -> Python Script and start to write some python code.
Another way to use python inside RStudio is via
R-Notebooks. Simply launch a new
R-Notebook and start to write python code inside a python code chunk. This way, one can use Python and R within the same notebook. 😏
Possibly the most exciting way of using Python with R is to import Python functions into R. This is a great way to add python functionality to an existing R environment.
- We’ll need a python script to house python code including functions that need to be imported into R
- We can then import the above functions via
As an example, say we have a python script called
py_example.py with the following code which allows us to fit a linear regression model using
from sklearn import linear_model linreg_python = linear_model.LinearRegression()
We can import this function into R by simply running:
And now we can use this function inside R as we would use any other function (note that syntax is different however).
# Fit model linreg_python$fit(X = mtcars[,-1], y = mtcars$mpg) ## LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
# Show coefficients data.frame(var = c("Intercept", names(mtcars)[-1]), python_coef = c(linreg_python$intercept_, linreg_python$coef_)) ## var python_coef ## 1 Intercept 12.30337416 ## 2 cyl -0.11144048 ## 3 disp 0.01333524 ## 4 hp -0.02148212 ## 5 drat 0.78711097 ## 6 wt -3.71530393 ## 7 qsec 0.82104075 ## 8 vs 0.31776281 ## 9 am 2.52022689 ## 10 gear 0.65541302 ## 11 carb -0.19941925
Just for fun, let’s also compare the output from good old
# Fit model and show coefficients from R fit <- lm(mpg ~ ., data = mtcars) data.frame(R_coef = coef(fit)) ## R_coef ## (Intercept) 12.30337416 ## cyl -0.11144048 ## disp 0.01333524 ## hp -0.02148212 ## drat 0.78711097 ## wt -3.71530393 ## qsec 0.82104075 ## vs 0.31776281 ## am 2.52022689 ## gear 0.65541302 ## carb -0.19941925
Using the above set-up, one could combine python’s extensive ML capabilities and R’s intuitive data munging and excellent data visualisation capabilities into one single powerful workflow. I wonder if Py-ThoR would be an appropriate name for such a workflow? 👊
Thoughts? Comments? Helpful? Not helpful? Like to see anything else added in here? Let me know!