Descriptive Statistics: Austin Zillow Data
Overview
Over the course of the next several class meetings, we’ll use what we learn about descriptive statistics to analyze data on properties from the greater Austin, TX region, listed for sale on Zillow. Each of the interactive notebooks you complete in preparation for our class meetings will help us with part of the analysis.
Topic 1: An Introduction to Data and Sampling- This notebook will prepare us to look at our data set and to identify variables as numerical, categorical, unique identifiers, and more.
Topic 2: An Introduction to R- This notebook will give us a foundation for using R to interact with our data.
Topic 3: Descriptive Statistics for Numerical and Categorical Data- This notebook will provide us with the tools we’ll need to compute summary statistics on our numerical and categorical columns.
Topic 4: Data Visualization and a Grammar of Graphics- This notebook gives us data visualization and data-based story-telling superpowers. We’ll use
ggplot2to display visual representations of our data – providing a more full picture than summary statistics alone.
- This notebook gives us data visualization and data-based story-telling superpowers. We’ll use
Even after completing these notebooks, you won’t be an expert in using R to analyze your data. Expect to encounter errors and do your best to not be frustrated by them. If you encounter errors that you can’t troubleshoot, post them (along with an explanation of what you are trying to do) to Slack. You may also try using your favorite LLM, such as ChatGPT to help you troubleshoot code – just be aware, that these LLMs were not built for programming, so they often provide broken code back. In my experience, ChatGPT is pretty excellent at finding missing or misplaced commas, though!
About the Austin Zillow Data Set
We’ll fill in a data dictionary here by using the knowledge we gain from the Topic 1 and Topic 2notebooks.
Numerical Variables
Numerical variables are variables for which here goes a description of how to tell whether a variable is numeric. The following list contains our numerical variables and their descriptions.
variable_one- here’s a description of our first numerical variable.variable_two- here’s a description of our second numerical variable.
Categorical Variables
Categorical variables are variables for which here goes a description of how to tell whether a variable is categorical. The following list contains our categorical variables and their descriptions.
variable_one- here’s a description of our first categorical variable.variable_two- here’s a description of our second categorical variable.
Unique Identifiers
Unique identifiers are variables for which here goes a description of how to tell whether a variable is a unique identifier. The following list contains our unique identifiers and their descriptions.
variable_one- here’s a description of our first unique identifier.variable_two- here’s a description of our second unique identifier.
An Initial Exploration Through Summary Statistics
Using the knowledge we gain from the Topic 3 notebook, we’ll compute some summary statistics so that we better understand our data.
Visualizing the Austin Zillow Data
We’ll use skills from the Topic 4 notebook here to gain real insights into the Zillow Data, and begin to “tell the story” of what features are most closely associated with property values in and around Austin, TX.
Summary
We’ll summarize the work we’ve done here.
Appendix: Some Example Questions
I have some example questions that I can share with you, but I’d rather have you and your teammates generate questions that you find genuinely interesting.
Some questions are sufficiently answered with a summary statistic while others require a plot to gain a full understanding. It is often the case that summary statistics paired with plots provide the insights you need.
Univariate Questions
Univariate questions are questions about a single variable. Below are a couple examples if you need help getting started.
Is the city variable in the data set always Austin? What other locations are there? How frequently do they appear?
What is the average lot size in square feet?
Multivariable Questions
Multivariable / multivariate questions are questions involving at least two variables, but may include more! The more variables involved, the more difficult the question may be to answer, and the more data we may need in order to answer those questions with sufficient confidence. Below are two examples to get you started if you need help.
- How does the average lot size change by location?
- Is there an association between average lot size in square feet and median number of students per teacher?