September 11, 2024
We work in Quarto documents, which allow a mixture of code and text in a single document
Quarto documents consist largely of the following components
---
)We’ll use very simple YAML headers in this course, but you are welcome to explore more complex document customization if you like.
Our YAML header will generally look like the following:
You can actually just use the title
, author
, and format
settings if you prefer.
format
: We’ll use html
by default, but you can use docx
to output a Word Document or pdf
to output a PDF filetheme
: You can find the available document themes hereMarkdown formatting in this course is optional, but using markdown can make your documents look quite nice. The following are the most common pieces of markdown you might find use for.
this is code
Within a Quarto Document, R code is run inside of a code chunk like the one below:
By default, R can do basic calculations
library(tidyverse)
in a code chunkWe run R code by holding ctrl
and hitting Enter
or Return
We store items in variables using the arrow operator – for example, by running x <- 2
in a code chunk
x
in a code chunk would print 2
We can read data from a csv file using read_csv("file_path")
A data dictionary appears below:
id
is a row number (unique identifier)
description
is a free-form text field, describing the property (unique identifier…for our course)
city
, homeType
, hasSpa
, and priceRange
are all categorical variables
latitude
, longitude
, lotSizeSqFt
, avgSchoolRating
, and MedianStudentsPerTeacher
are all numerical variables
garageSpaces
, yearBuilt
, numOfPatioAndPorchFeatures
, numOfBathrooms
, and numOfBedrooms
could be treated as either numerical or categorical variables
\(\bigstar\) Work with the people next to you to come up with some questions which would be interesting to investigate with our Austin housing data.
05:00
\(\bigstar\) Work with the people next to you to come up with some questions which would be interesting to investigate with our Austin housing data.
\(\bigstar\) What questions did we come up with?
07:00
\(\bigstar\) Work with the people next to you to come up with some questions which would be interesting to investigate with our Austin housing data.
\(\bigstar\) What questions did we come up with?
\(\bigstar\) Take a few minutes to write those questions into your Day2to5_AustinHousingData.qmd
file. Render your notebook to make sure everything looks the way you intended. Try some markdown formatting to improve the structure and readability of your notebook.
10:00
\(\bigstar\) Work with the people next to you to decide which of your questions are just about your sample data and which of your questions are about the entire population.
02:00
\(\bigstar\) Work with the people next to you to decide which of your questions are just about your sample data and which of your questions are about the entire population.
\(\bigstar\) Update your notebook to include two subsections – one with sample-level questions and the other with population-level questions. When finished, you should have two versions of every one of the questions you initially wrote down.
05:00
\(\bigstar\) Work with the people next to you to decide which of your questions are just about your sample data and which of your questions are about the entire population.
\(\bigstar\) Update your notebook to include two subsections – one with sample-level questions and the other with population-level questions. When finished, you should have two versions of every one of the questions you initially wrote down.
\(\bigstar\) What is the main difference in phrasing between descriptive (sample-level) questions and inferential (population-level) questions?
01:00
\(\bigstar\) If we are going to use our available data to answer inferential (population-level) questions, then what assumption(s) are we making?
01:00
\(\bigstar\) If we are going to use our available data to answer inferential (population-level) questions, then what assumption(s) are we making?
\(\bigstar\) Can both types of question (descriptive and inferential) be answered simply by calculating summary statistics from our sample data? Why or why not?
01:00
\(\bigstar\) If we are going to use our available data to answer inferential (population-level) questions, then what assumption(s) are we making?
\(\bigstar\) Can both types of question (descriptive and inferential) be answered simply by calculating summary statistics from our sample data? Why or why not?
\(\bigstar\) Without using R code just yet, describe what you would need to do in order to answer each of your descriptive questions. Add those descriptions to your notebook.
05:00
Render your notebook and make sure that the sections we’ve updated look as you intended them to.
We didn’t really use any R today, but we’ll pick up where we left off next time and actually use R to answer the descriptive questions we’ve posed.
Homework: Complete and submit the Topic 3
notebook at least 30 minutes before Monday’s class meeting. That notebook will give you many of the tools we’ll need for Monday.
Question: Moving forward, should we continue using slide decks like this one?