Purpose: In this notebook, you’ll either verify that
you have a local Python 3.x installation or install one. You’ll also
create an R Project linked to a GitHub repository, and configure
{reticulate}
to use a virtual Python environment within
your project. We’ll also build a short notebook to show how to utilize
Python within an RMarkdown document.
{reticulate}
Follow the steps below to set up Python for use from within RStudio. The first few steps with GitHub are not necessary for using Python in RStudio, but will be helpful moving forward for our course.
Navigate to GitHub and create a repository for our course.
Open your repository and click the green Code button, copy the HTTPS link to your repository.
Open RStudio, click File -> New Project..
.
In the console (lower-left), type and run
install.packages("reticulate")
to install the
{reticulate}
package, which allows use of other languages
within RStudio.
In the lower left pane, click on the Terminal tab. Type
and run which python
. You should get a file path pointing
to your Python executable. Make sure your Python version is 3.x and not
2.x.
path_to_python <- reticulate::install_python()
, which
will install python on your machine and save the path to the executable
in the path_to_python
variable.Switch back to the Console tab and run
reticulate::virtualenv_create("mat434", path_to_python)
. If
you already had python installed, provide the explicit path to your
Python executable that you saw when you ran which python
from the terminal. This virtual environment will be placed in your home
directory.
Now you have Python installed, RStudio knows where to find it, and you’ve got a virtual environment created to manage your Python packages.
Nearly all of our work for this course will occur within RMarkdown documents. Below, we’ll build a very short notebook and use some Python inside of it.
Use File -> New File -> RMarkdown...
to open a
new RMarkdown document. Fill in the fields and click the button to
create the notebook.
Delete all of the boilerplate from line 11 down.
In the initial r
setup chunk, add a line to
reticulate::use_virtualenv("mat434")
, and run it with
ctrl+Enter
or cmd+Return
.
In that same setup chunk, add a line to install the
numpy
module in your virtual environment. The code for
doing that is
reticulate::virtualenv_install("mat434", "numpy")
. You can
also install pandas
, plotnine
, and
scikit-learn
now.
reticulate::
if you load the {reticulate}
package with library(reticulate)
.In a text cell (over a white background), write a short sentence about running code in Python.
Open a new code cell/chunk using the +C
button at
the top of the script editor and choose python
as the
language.
In your Python code cell, add a line to
import pandas as pd
. Run that line.
In that same cell, let’s read some data into this notebook.
Here’s a link
to some data on birdstrikes on airplanes from the FAA. Use
pd.read_csv()
to read this file into your notebook by
passing the quoted URL as the only argument to the function. Store your
object in a variable FAAdata
.
Write a new line FAAdata.head()
and run it.
In a text cell (white space) below your code cell and output, describe what you are seeing.
Use the blue ball of yarn icon to knit your notebook to
an HTML file. You’ll be prompted to save your notebook first – you can
give it a name like MyFirstPythonNotebook
.
You’ve got a working installation of python that can be accessed from within RStudio and using RMarkdown notebooks. You’ve seen how to install python packages and work within R code cells and Python code cells.