Welcome to R; A Quick Overview

Dr. Gilbert

September 10, 2024

The Highlights

  • Interacting with R
  • The R dialects
  • Installing and loading packages
  • Reading in data
  • First interactions with data in R
  • The pipe operator (%>% or |>)
  • Manipulating and transforming data with R
  • All changes are temporary unless you make them permanent

Interacting with R

  • We’ll interact with R via RStudio in MAT300.

    • In the future, you can use R with other IDEs, like VScode or Positron.
  • We’ll be using Quarto Documents/Notebooks for everything we do.

    • Notebooks allow us to fully document and share analyses even with non-technical people.

Let’s Get Started

  1. Open RStudio

  2. Create a new project by navigating to File -> New Project

    • Choose to create the project in a new directory (folder)
    • Name this something like MAT300 – you’ll include all of your notebooks for our class in this project
    • Note: If you’d like to use GitHub and manage your project space using a repository, come see me in office hours or reach out on Slack and I’ll help you get set up.
  3. Now that you are in your new project space, create a new Quarto Document by navigating to File -> New File -> Quarto Document

    • You can edit the fields, or accept the defaults – it’s up to you

The R Dialects

There are three main dialects in R

  • Base-R

  • data.table (speed)

  • Tidy-R / tidyverse (readability and consistency)

    • We use this one. When you search for help – include “tidyverse” in your query.

Note: R dialects just refers to how we choose to write R code and which functions we prioritize – dialects can be (and often are) mixed.

Installing and Loading Packages

  • install.packages("PACKAGE_NAME") to install a package

    • You only do this once
    • Do it in the console
  • library(PACKAGE_NAME) to load the package

    • Do this near the top of your Quarto Document, in a setup chunk

\(\bigstar\) Install the {tidyverse} and load it in your Quarto Notebook

Reading in Data

ASIDE: We store objects in R with the arrow operator (<-)

x <- 2

Reading Data: To read data, we use a function of the form read_*().

data <- read_csv("PATH_TO_CSV_FILE")
  • Requires the {tidyverse} (or at least {readr}) to be loaded
  • Similar functions exist for reading other file formats
  • Some require other packages ({readxl} or {haven} are common)

\(\bigstar\) Read this airbnb dataset into your Quarto Notebook from this link

I’ll post the link in Slack

Note: This AirBnB Europe data was uploaded to Kaggle by Dipesh Khemani. The original dataset can be found here.

First Interactions with Data

  • head() to view first six rows
  • glimpse() to view dimensions and data types
  • skim() from {skimr} for much more detail

\(\bigstar\) Try these functions on your airbnb data

02:00

The Pipe Operator

  • Pipes (%>% or |>) make code more readable and allow chaining of functions together
  • Object to the left of the pipe becomes first argument to the function after the pipe
  • Read the pipe to mean “and then
penguins %>%
  head()

\(\bigstar\) Rewrite the functions you used to explore your data with pipes

03:00

Manipulating and Transforming Data

  • filter() to return only desired records
  • select() to return only desired columns
  • summarize() to compute summaries on a table
  • group_by() to create groups in a table
  • mutate() to create new columns or change existing ones

\(\bigstar\) How might we use these functions? Write down some questions that could be answered using the functions described above. Start with a couple very simple questions and then work up to questions whose answers might be more complex to find.

05:00

Manipulating and Transforming Data

  • filter() to return only desired records
  • select() to return only desired columns
  • summarize() to compute summaries on a table
  • group_by() to create groups in a table
  • mutate() to create new columns or change existing ones

\(\bigstar\) How might we use these functions? Write down some questions that could be answered using the functions described above. Start with a couple very simple questions and then work up to questions whose answers might be more complex to find.

\(\bigstar\) We’ll try answering some of those questions now!

10:00

All Changes are Temporary

penguins %>%
  filter(species == "Gentoo") %>%
  group_by(island) %>%
  summarize(
    avg_mass = mean(body_mass_g)
  )
  • Start with the penguins data frame, and then
  • filter to just the Gentoo species, and then
  • group by island, and then
  • calculate average penguin body mass for each group

Note: penguins data frame is not permanently altered here

Until We Make Them Permanent

penguins <- penguins %>%
  filter(species == "Gentoo") %>%
  group_by(island) %>%
  summarize(
    avg_mass = mean(body_mass_g)
  )

Now the change is permanent because we’ve stored the result

  • Notice the use of the arrow operator (<-)

  • Be careful overwriting existing objects – think about whether you:

    • might need the old object again
    • would be better off creating a new object (variable)
    • even need to store the result at all

Let’s Practice

Reminder: You have a fully complete (and documented) notebook using the mpg data on the course webpage – note this data is different than the airbnb data you worked with today

Use this time to continue playing with the airbnb pricing data

  1. Save your QMD file

  2. Use the blue render button to convert your markdown document into a beautiful HTML document and enjoy the fruits of your labor!

    • 🤔 Ponder an existence where you never need to open MS Word again! 🤔
  3. Write down and answer additional interesting questions that might use functionality discussed in this slide deck – start simple and then build up to questions that might be more complex

  4. Document your work by including text descriptions alongside the code chunks

Don’t worry if your document looks quite plain for now, we’ll have a full class meeting devoted to using markdown syntax in Quarto effectively

Next Time…


Data Visualization