{reticulate}
Purpose: In this notebook, we’ll see that
{reticulate}
is more than just a package that allows you to
work with Python inside of RStudio. This package allows for users and
teams to mix Python and R code in a single document, passing objects
back and forth between environments. Even if you are planning to use
only R for this course, walking through this short notebook will be
helpful since it will allow you to collaborate with Python users.
As mentioned, the {reticulate}
package supports a single
document accessing multiple environments. This is great for teams with
developers who prefer different languages or for individuals who like
one language for some purposes and another language for other
purposes.
Open up RStudio and open the R project which is managing your GitHub repository for this class.
Use File -> New File -> R Markdown...
to open
a new RMarkdown notebook. Fill in the fields and create the
document.
In the initial R code chunk, add a line to
reticulate::use_virtualenv("mat434")
and run it.
Delete all of the boilerplate from line 12 down.
In the R console (>
rather than
>>>
), run
install.packages("tidyverse")
to install the
{tidyverse}
ecosystem of packages for R.
>>>
), then
you can revert back to R by typing exit
in the console and
running it.In that first R code chunk, add a line
library(tidyverse)
to load the {tidyverse}
and
run it. Add and run a line library(reticulate)
as well,
since we’ll be using {reticulate}
functionality
here.
Open a new code cell, but this time make it a Python chunk. You
can either build the code chunk manually or by using the +C
icon at the top of the script editor.
Use python code to read in the FAA
airstrikes data as FAAdata
. Don’t forget to
import pandas as pd
first.
Print out the .head()
of the data frame using python
syntax.
Look in the Environment tab in your top-right pane. You
should see FAAdata
there in addition to a couple of other
items. Click the dropdown menu next to Python
and switch to
R
. Your R environment is empty – only Python knows about
FAAdata
!
To convince ourselves that this is the case, open a new R code
chunk and try running FAAdata %>% head()
. You’ll receive
an error saying that FAAdata
is not found.
Having two completely separate environments is not all that useful,
but {reticulate}
provides us powers to pass objects between
our two environments. We can access items in the Python environment from
R code chunks using py$OBJECT_NAME
, and items the R
environment from Python code chunks using
r.OBJECT_NAME
.
FAAdata
from an R code cell and store it
into an R object called FAAdata_R
. Print out the
head()
of the FAAdata_R
data frame using
FAAdata_R %>% head()
.group_by()
and summarize()
our data to obtain
counts of incidents by month at each airport. We’re interested in the
most active months and airports, so we’ll arrange()
the
data frame in order of decreasing count.FAAdata_month_airport <- FAAdata_R %>%
group_by(incident_month) %>%
count(airport_id) %>%
arrange(-n)
In this notebook, you were exposed to the power of
{reticulate}
to pass objects back and forth between your R
and Python environments. This allows for multilingual teams to
collaborate on projects without having to submit to one language or
another. The work we did here was basic, but this can be quite useful if
you (or collaborators) prefer different languages for different
tasks.