Intro to R with Austin Housing Data

September 11, 2024

Review of Quarto

We work in Quarto documents, which allow a mixture of code and text in a single document

Quarto documents consist largely of the following components

The YAML header governs global document properties and appears at the top of the document between code fences (---)
Code chunks begin and end with triple backticks (see below), creating sections of the document over a grey background
Sections of the document over a white background are just text cells and you can type freely there – if you want to try some markdown to format your text, then you are free to do so!

An example of a code chunk

```{r}
#Code goes here...

```

YAML Overview

We’ll use very simple YAML headers in this course, but you are welcome to explore more complex document customization if you like.

Our YAML header will generally look like the following:

---
title: "Notebook Title"
author: "Your Name"
format: html
date: today
date-format: long
theme: flatly
---

You can actually just use the title, author, and format settings if you prefer.

Options for format: We’ll use html by default, but you can use docx to output a Word Document or pdf to output a PDF file
Options for theme: You can find the available document themes here

Basic Markdown Overview

Markdown formatting in this course is optional, but using markdown can make your documents look quite nice. The following are the most common pieces of markdown you might find use for.

Headings are built using the hashtag symbol (#) – more hashtags means a lower-level (smaller font) heading
We can surround text by a single asterisk to make that text italics – for example *this is italics* formats as this is italics
We can surround text by double asterisks to make that text bold – for example **this is bold** formats as this is bold
We can surround text by backticks to format that text as “code” – for example `this is code` formats as this is code
Bulleted lists, like this one can be made by starting each line with a “+” sign (you’ll need an empty line preceeding the list)
Numbered lists are the same as bulleted lists, but you just start each line with a number

Review of R So Far

Within a Quarto Document, R code is run inside of a code chunk like the one below:

```{r}

```

By default, R can do basic calculations
- We load libraries to enable specialized functionality – for example, by running library(tidyverse) in a code chunk
We run R code by holding ctrl and hitting Enter or Return
We store items in variables using the arrow operator – for example, by running x <- 2 in a code chunk
- We can print out the contents of an object/variable just by calling its name – for example running x in a code chunk would print 2
We can read data from a csv file using read_csv("file_path")
- Don’t forget to store the result into a variable if you want to use the data later!

Reminder of the Austin, TX Housing Data

A data dictionary appears below:

id is a row number (unique identifier)
description is a free-form text field, describing the property (unique identifier…for our course)
city, homeType, hasSpa, and priceRange are all categorical variables
latitude, longitude, lotSizeSqFt, avgSchoolRating, and MedianStudentsPerTeacher are all numerical variables
garageSpaces, yearBuilt, numOfPatioAndPorchFeatures, numOfBathrooms, and numOfBedrooms could be treated as either numerical or categorical variables
- We get to choose how to treat them

In-Class Activity, Part I

\(\bigstar\) Work with the people next to you to come up with some questions which would be interesting to investigate with our Austin housing data.

Start with some “easy” questions about single variables at a time
Move to some questions about combinations of variables

05:00

In-Class Activity, Part I

\(\bigstar\) Work with the people next to you to come up with some questions which would be interesting to investigate with our Austin housing data.

Start with some “easy” questions about single variables at a time
Move to some questions about combinations of variables

\(\bigstar\) What questions did we come up with?

07:00

In-Class Activity, Part I

\(\bigstar\) Work with the people next to you to come up with some questions which would be interesting to investigate with our Austin housing data.

Start with some “easy” questions about single variables at a time
Move to some questions about combinations of variables

\(\bigstar\) What questions did we come up with?

\(\bigstar\) Take a few minutes to write those questions into your Day2to5_AustinHousingData.qmd file. Render your notebook to make sure everything looks the way you intended. Try some markdown formatting to improve the structure and readability of your notebook.

10:00

In-Class Activity, Part II

\(\bigstar\) Work with the people next to you to decide which of your questions are just about your sample data and which of your questions are about the entire population.

02:00

In-Class Activity, Part II

\(\bigstar\) Work with the people next to you to decide which of your questions are just about your sample data and which of your questions are about the entire population.

\(\bigstar\) Update your notebook to include two subsections – one with sample-level questions and the other with population-level questions. When finished, you should have two versions of every one of the questions you initially wrote down.

05:00

In-Class Activity, Part II

\(\bigstar\) Work with the people next to you to decide which of your questions are just about your sample data and which of your questions are about the entire population.

\(\bigstar\) What is the main difference in phrasing between descriptive (sample-level) questions and inferential (population-level) questions?

01:00

In-Class Activity, Part III

\(\bigstar\) If we are going to use our available data to answer inferential (population-level) questions, then what assumption(s) are we making?

01:00

In-Class Activity, Part III

\(\bigstar\) If we are going to use our available data to answer inferential (population-level) questions, then what assumption(s) are we making?

\(\bigstar\) Can both types of question (descriptive and inferential) be answered simply by calculating summary statistics from our sample data? Why or why not?

01:00

In-Class Activity, Part III

\(\bigstar\) If we are going to use our available data to answer inferential (population-level) questions, then what assumption(s) are we making?

\(\bigstar\) Can both types of question (descriptive and inferential) be answered simply by calculating summary statistics from our sample data? Why or why not?

\(\bigstar\) Without using R code just yet, describe what you would need to do in order to answer each of your descriptive questions. Add those descriptions to your notebook.

05:00

Closing

Render your notebook and make sure that the sections we’ve updated look as you intended them to.
- Make any updates you like and re-render the notebook.
We didn’t really use any R today, but we’ll pick up where we left off next time and actually use R to answer the descriptive questions we’ve posed.
Homework: Complete and submit the Topic 3 notebook at least 30 minutes before Monday’s class meeting. That notebook will give you many of the tools we’ll need for Monday.

Question: Moving forward, should we continue using slide decks like this one?