A Crash Course in Everything

Git, Quarto, and R

Dr. Gilbert

January 12, 2025

Objectives

These slides address the following items.

How do I create a GitHub repo?
How do I connect my GitHub repo to an R Project on my local machine?
What is a Quarto Document and how do I use it?
How do I install and load packages in R? In particular we’ll work with the {tidyverse}.
How do I read data into R from both local and remote sources?
How do I interact with, and manipulate, data using the tools and principles of the {tidyverse}?

Creating a GitHub Repo

Navigate to GitHub and log in
Click the green button labeled New
Name your repo (something descriptive, like MAT434 or ClassificationCourse)
Click the checkbox to initialize the repo with a README file
Click the green button to Create repository.

Cloning Your Repo to a Local R Project

Inside your GitHub repo, locate the green Code dropdown
Click it and copy the HTTPS address
Open RStudio
Use File -> New Project to begin creating a new R project
Select From Version Control, then Git, and paste in the URL
Hit the Browse button to change where the project will live
Navigate to Documents, click to create a new folder: GitHub
Hit Open/Okay and then click Create

Create a New Quarto Document

Use File -> New File -> Quarto Document to create a new Quarto notebook
Add a title – something like Crash Course in Git, Quarto, and R
Add yourself as the author
Click Create
Use File -> Save As... to save the file – give it a name like CrashCourse.qmd and leave it in the project directory
Click the blue arrow button to render the notebook to HTML

Back to Git…

Click on the Git tab in the top-right pane of RStudio
You’ve got some yellow flagged files listed there – these are files not currently in your remote repo
Click the Pull button (you’re already up to date, but it’s a good practice to get in)
Click the checkboxes next to all your new files
Click the Commit button, add a message in the message box, and click Commit
Hit the push button to send your files out to the remote repo

Head back to your web browser and refresh your repo – your files have arrived!

The Quarto Notebook Environment

Quarto documents consist of three main components
- a YAML header
- code chunks (grey background)
- text/markdown cells (white background)
We’ll generally copy/paste the same YAML header (I’ll send you a slightly more complex template in Slack now)
Code chunks must have executable R code in them
Text/markdown cells can have anything you like

Here’s a Quarto cheatsheet, but feel free to work with your favorite LLM to get Quarto to do exactly what you want

Working in R

For the rest of the slide deck, we’ll transition to working in R but then come back to Git/GitHub at the end of the class meeting

Students coming from MAT300: Feel free to grab the data and start investigating

MLB batted balls and park dimensions, or
FAA wildlife impacts with planes

Question for New Students: Do you want to partner up with an existing student to work and learn from them, or do you want to continue with the slides?

The R Dialects

There are three main dialects in R

Base-R
data.table (speed)
Tidy-R / tidyverse (readability and consistency)
- We use this one. When you search for help – include “tidyverse” in your query.

Note: R dialects just refers to how we choose to write R code and which functions we prioritize – dialects can be (and often are) mixed.

Installing and Loading Packages

install.packages("PACKAGE_NAME") to install a package
- You only do this once
- Do it in the console
library(PACKAGE_NAME) to load the package
- Do this near the top of your Quarto Document, in a setup chunk

\(\bigstar\) Install the {tidyverse} and load it in your Quarto Notebook

\(\bigstar\) While you are at it, install the {skimr}, {tidymodels}, {patchwork}, and {kableExtra} packages as well – {skimr} and {kableExtra} are the only ones you might want to load for now

Reading in Data

ASIDE: We store objects in R with the arrow operator (<-)

x <- 2

Reading Data: To read data, we use a function of the form read_*().

data <- read_csv("PATH_TO_CSV_FILE")

Requires the {tidyverse} (or at least {readr}) to be loaded
Similar functions exist for reading other file formats
Some require other packages ({readxl} or {haven} are common)

\(\bigstar\) Read the MLB batted balls data and the park dimensions data into your Quarto Notebook

I’ll post the links in Slack

First Interactions with Data

head() to view first six rows
glimpse() to view dimensions and data types
skim() from {skimr} for much more detail

\(\bigstar\) Try these functions on your battedballs data

02:00

The Pipe Operator

Pipes (%>% or |>) make code more readable and allow chaining of functions together
Object to the left of the pipe becomes first argument to the function after the pipe
Read the pipe to mean “and then”

penguins %>%
  head()

\(\bigstar\) Rewrite the functions you used to explore your data with pipes

Manipulating and Transforming Data

filter() to return only desired records (by a conditional statement)
select() to return only desired columns (by name, separated by commas)
summarize() to compute summaries on a table
group_by() to create groups in a table
mutate() to create new columns or change existing ones

\(\bigstar\) How might we use these functions? Write down some questions that could be answered using the functions described above. Start with a couple very simple questions and then work up to questions whose answers might be more complex to find.

Manipulating and Transforming Data

filter() to return only desired records (by a conditional statement)
select() to return only desired columns (by name, separated by commas)
summarize() to compute summaries on a table
group_by() to create groups in a table
mutate() to create new columns or change existing ones

\(\bigstar\) We’ll try answering some of those questions now!

All Changes are Temporary

penguins %>%
  filter(species == "Gentoo") %>%
  group_by(island) %>%
  summarize(
    avg_mass = mean(body_mass_g)
  )

Start with the penguins data frame, and then
filter to just the Gentoo species, and then
group by island, and then
calculate average penguin body mass for each group

Note: penguins data frame is not permanently altered here

Until We Make Them Permanent

penguins <- penguins %>%
  filter(species == "Gentoo") %>%
  group_by(island) %>%
  summarize(
    avg_mass = mean(body_mass_g)
  )

Now the change is permanent because we’ve stored the result

Notice the use of the arrow operator (<-)
Be careful overwriting existing objects – think about whether you:
- might need the old object again
- would be better off creating a new object (variable)
- even need to store the result at all

Let’s Practice

Use this time to continue playing with the MLB data sets

Use the blue render button to convert your markdown document into a beautiful HTML document and enjoy the fruits of your labor!
- 🤔 Ponder an existence where you never need to open MS Word (or PowerPoint) again! 🤔
Write down and answer additional interesting questions that might use functionality discussed in this slide deck – start simple and then build up to questions that might be more complex
Document your work by including text descriptions alongside the code chunks

Don’t worry if your document looks quite plain for now, our next class meeting is devoted to using markdown syntax in Quarto effectively

Next Time…

Quarto, Inline R, and semi-automated reporting