Welcome to R; A Quick Overview

Dr. Gilbert

September 10, 2024

The Highlights

Interacting with R
The R dialects
Installing and loading packages
Reading in data
First interactions with data in R
The pipe operator (%>% or |>)
Manipulating and transforming data with R
All changes are temporary unless you make them permanent

Interacting with R

We’ll interact with R via RStudio in MAT300.
- In the future, you can use R with other IDEs, like VScode or Positron.
We’ll be using Quarto Documents/Notebooks for everything we do.
- Notebooks allow us to fully document and share analyses even with non-technical people.

Let’s Get Started

Open RStudio
Create a new project by navigating to File -> New Project
- Choose to create the project in a new directory (folder)
- Name this something like MAT300 – you’ll include all of your notebooks for our class in this project
- Note: If you’d like to use GitHub and manage your project space using a repository, come see me in office hours or reach out on Slack and I’ll help you get set up.
Now that you are in your new project space, create a new Quarto Document by navigating to File -> New File -> Quarto Document
- You can edit the fields, or accept the defaults – it’s up to you

The R Dialects

There are three main dialects in R

Base-R
data.table (speed)
Tidy-R / tidyverse (readability and consistency)
- We use this one. When you search for help – include “tidyverse” in your query.

Note: R dialects just refers to how we choose to write R code and which functions we prioritize – dialects can be (and often are) mixed.

Installing and Loading Packages

install.packages("PACKAGE_NAME") to install a package
- You only do this once
- Do it in the console
library(PACKAGE_NAME) to load the package
- Do this near the top of your Quarto Document, in a setup chunk

\(\bigstar\) Install the {tidyverse} and load it in your Quarto Notebook

Reading in Data

ASIDE: We store objects in R with the arrow operator (<-)

x <- 2

Reading Data: To read data, we use a function of the form read_*().

data <- read_csv("PATH_TO_CSV_FILE")

Requires the {tidyverse} (or at least {readr}) to be loaded
Similar functions exist for reading other file formats
Some require other packages ({readxl} or {haven} are common)

\(\bigstar\) Read this airbnb dataset into your Quarto Notebook from this link

I’ll post the link in Slack

Note: This AirBnB Europe data was uploaded to Kaggle by Dipesh Khemani. The original dataset can be found here.

First Interactions with Data

head() to view first six rows
glimpse() to view dimensions and data types
skim() from {skimr} for much more detail

\(\bigstar\) Try these functions on your airbnb data

02:00

The Pipe Operator

Pipes (%>% or |>) make code more readable and allow chaining of functions together
Object to the left of the pipe becomes first argument to the function after the pipe
Read the pipe to mean “and then”

penguins %>%
  head()

\(\bigstar\) Rewrite the functions you used to explore your data with pipes

03:00

Manipulating and Transforming Data

filter() to return only desired records
select() to return only desired columns
summarize() to compute summaries on a table
group_by() to create groups in a table
mutate() to create new columns or change existing ones

\(\bigstar\) How might we use these functions? Write down some questions that could be answered using the functions described above. Start with a couple very simple questions and then work up to questions whose answers might be more complex to find.

05:00

Manipulating and Transforming Data

filter() to return only desired records
select() to return only desired columns
summarize() to compute summaries on a table
group_by() to create groups in a table
mutate() to create new columns or change existing ones

\(\bigstar\) We’ll try answering some of those questions now!

10:00

All Changes are Temporary

penguins %>%
  filter(species == "Gentoo") %>%
  group_by(island) %>%
  summarize(
    avg_mass = mean(body_mass_g)
  )

Start with the penguins data frame, and then
filter to just the Gentoo species, and then
group by island, and then
calculate average penguin body mass for each group

Note: penguins data frame is not permanently altered here

Until We Make Them Permanent

penguins <- penguins %>%
  filter(species == "Gentoo") %>%
  group_by(island) %>%
  summarize(
    avg_mass = mean(body_mass_g)
  )

Now the change is permanent because we’ve stored the result

Notice the use of the arrow operator (<-)
Be careful overwriting existing objects – think about whether you:
- might need the old object again
- would be better off creating a new object (variable)
- even need to store the result at all

Let’s Practice

Reminder: You have a fully complete (and documented) notebook using the mpg data on the course webpage – note this data is different than the airbnb data you worked with today

Use this time to continue playing with the airbnb pricing data

Save your QMD file
Use the blue render button to convert your markdown document into a beautiful HTML document and enjoy the fruits of your labor!
- 🤔 Ponder an existence where you never need to open MS Word again! 🤔
Write down and answer additional interesting questions that might use functionality discussed in this slide deck – start simple and then build up to questions that might be more complex
Document your work by including text descriptions alongside the code chunks

Don’t worry if your document looks quite plain for now, we’ll have a full class meeting devoted to using markdown syntax in Quarto effectively

Next Time…

Data Visualization