Purpose: In this notebook, you’ll either verify that you have a local Python 3.x installation or install one. You’ll also create an R Project linked to a GitHub repository, and configure {reticulate} to use a virtual Python environment within your project. We’ll also build a short notebook to show how to utilize Python within an RMarkdown document.

Enabling Python with {reticulate}

Follow the steps below to set up Python for use from within RStudio. The first few steps with GitHub are not necessary for using Python in RStudio, but will be helpful moving forward for our course.

  1. Navigate to GitHub and create a repository for our course.

  2. Open your repository and click the green Code button, copy the HTTPS link to your repository.

  3. Open RStudio, click File -> New Project...

    • Choose from Version Control and use Git.
    • Paste the link you just copied into the Repository URL field.
    • Choose a convenient location on your machine for your local repository.
    • Create the project and let RStudio reload and switch to that project space.
  4. In the console (lower-left), type and run install.packages("reticulate") to install the {reticulate} package, which allows use of other languages within RStudio.

  5. In the lower left pane, click on the Terminal tab. Type and run which python. You should get a file path pointing to your Python executable. Make sure your Python version is 3.x and not 2.x.

    • If you have Python 3.x installed, then skip to step 6.
    • If you don’t have Python installed, switch back to the Console tab and type/run path_to_python <- reticulate::install_python(), which will install python on your machine and save the path to the executable in the path_to_python variable.
  6. Switch back to the Console tab and run reticulate::virtualenv_create("mat434", path_to_python). If you already had python installed, provide the explicit path to your Python executable that you saw when you ran which python from the terminal. This virtual environment will be placed in your home directory.

Now you have Python installed, RStudio knows where to find it, and you’ve got a virtual environment created to manage your Python packages.

Python in an RMarkdown Notebook

Nearly all of our work for this course will occur within RMarkdown documents. Below, we’ll build a very short notebook and use some Python inside of it.

  1. Use File -> New File -> RMarkdown... to open a new RMarkdown document. Fill in the fields and click the button to create the notebook.

  2. Delete all of the boilerplate from line 11 down.

  3. In the initial r setup chunk, add a line to reticulate::use_virtualenv("mat434"), and run it with ctrl+Enter or cmd+Return.

  4. In that same setup chunk, add a line to install the numpy module in your virtual environment. The code for doing that is reticulate::virtualenv_install("mat434", "numpy"). You can also install pandas, plotnine, and scikit-learn now.

    • Installation commands shouldn’t be included in a notebook, so let’s delete them now that we’re finished installing packages. Note that this is how you’ll install Python packages into your virtual environment for the future though.
    • Note: You can avoid the need to prefix these commands with reticulate:: if you load the {reticulate} package with library(reticulate).
  5. In a text cell (over a white background), write a short sentence about running code in Python.

  6. Open a new code cell/chunk using the +C button at the top of the script editor and choose python as the language.

  7. In your Python code cell, add a line to import pandas as pd. Run that line.

  8. In that same cell, let’s read some data into this notebook. Here’s a link to some data on birdstrikes on airplanes from the FAA. Use pd.read_csv() to read this file into your notebook by passing the quoted URL as the only argument to the function. Store your object in a variable FAAdata.

  9. Write a new line FAAdata.head() and run it.

  10. In a text cell (white space) below your code cell and output, describe what you are seeing.

  11. Use the blue ball of yarn icon to knit your notebook to an HTML file. You’ll be prompted to save your notebook first – you can give it a name like MyFirstPythonNotebook.

Summary

You’ve got a working installation of python that can be accessed from within RStudio and using RMarkdown notebooks. You’ve seen how to install python packages and work within R code cells and Python code cells.