Skip to the content.
Applied Statistics II (Regression Analysis) | Welcome

MAT 300 - Applied Statistics II: Regression Analysis

Syllabus (Fall 2024)

Course Description: This is a second course in statistics that builds upon knowledge gained in an introductory statistics course that covers statistical inference. Students will learn to build statistical models and develop skills for implementing regression analysis in real-world problems from engineering, sociology, psychology, science and business. Topics include multiple regression models (including first-order, second-order and interaction models with quantitative and qualitative variables), regression pitfalls, and residual analysis. Additional topics will be covered if time permits. Students will gain experience not only in the mechanics of regression analysis (often by means of a statistical software package) but also in deciding on appropriate models, selecting inferential techniques to answer a particular question, interpreting results and diagnosing problems.

Students in this course will use R, in particular the {tidyverse} and {tidymodels} ecosystems, to build and analyze regression models. The course covers simple and multiple linear regression, curvi-linear regression with polynomial and interaction terms, regularization with Ridge Regression and the LASSO, and tree-based models/ensembles. Cross-validation is implemented as an important technique for stable and unbiased model performance estimates, for identifying appropriate levels of model flexibility, and for hyperparameter tuning.

Course Timeline and Notebooks

Below is a tentative timeline for our course. It includes preparatory work that should be done prior to each class meeting, a detailed set of notes corresponding to each class meeting, and assignments following each class meeting. The prepared notebooks use the Palmer penguins and ames housing datasets and are provided so that you have a detailed account of each topic we discuss. We’ll learn this content better by doing it that we will by simply reading and running pre-existing code, so we’ll plan to utilize different data in class. For now, I’m planning to start with this data set on rental properties in the San Francisco Bay Area posted to Craigslist, generously made open by Dr. Kate Pennington. We can switch to alternate data sets as student interest dictates. I’ve prepared the following student notes template (html, Quarto) that I hope you’ll use to follow along during our in-class discussions.

A Note on the Slide Decks: I built these slides to be displayed as a split-screen, alongside an open RStudio session. In this way, you can play along by building your own analysis with a different data set (or the same one, if you prefer). If you try displaying the slides across your full screen, the content will flow off the bottom of the page.

Class Meeting Before Class During Class Slides After Class
1 Review Syllabus
Install R and RStudio
Introduction and What to Expect) Slide Version of Intro and Expectations Finish Software Setup
2 Enroll in Competition
Read ISLR $\S$ 2.1 (Part I, Part II)
What is Statistical Learning?
Competition Discussion
Slide Version of Overview What is an Analytics Report?
Competition Assignment 1
Analytics Report Shell (html, Quarto)
3 Read ISLR $\S$ 2.3 Introduction to R: Enter the tidyverse (html, Quarto) Companion Slides  
4 Read R4DS $\S$ 3.1 - 3.10
(Optional)
Data Viz and ggplot2 (html, Quarto) Companion Slides Competition Assignment 2
5   R Workshop Day: Quarto and R
Quarto Tips
   
6   Data Wrangling Workshop (html, Quarto) Companion Slides Homework 1 (html, Quarto)
7   Introduction to {tidymodels} (html, Quarto) Companion Slides  
8 Intro Stats Review Hypothesis Testing and Confidence / Prediction Intervals in Regression (html, Quarto) Companion Slides Homework 2 (html, Quarto)
9 Read ISLR $\S$ 3.1 (Part I, Part II) Simple Linear Regression:
Construction, Interpretation, and Model Assessment (html, Quarto)
Companion Slides Competition Assignment 3
10 Read ISLR $\S$ 3.2 (Part I, Part II) Multiple Linear Regression:
Construction, Interpretation, and Model Assessment (html, Quarto)
Companion Slides  
11   Residual Analysis and Model Quality Companion Slides  
12 Read ISLR $\S$ 3.3 (Part I, Part II) Categorical Predictors and Interpretations
Feature Engineering with step_other() and step_dummy() (html, Quarto)
Companion Slides Competition Assignment 4
13   Model Building, Assessment, and Interpretation Workshop    
14   Higher-Order Terms:
Curvi-Linear Regression and Polynomial Terms with step_poly() (html, Quarto)
Companion Slides Competition Assignment 5
15   Higher-Order Terms:
Interaction with step_interact() (html, Quarto)
Companion Slides  
16   Inference and Interpretation with {marginaleffects} Companion Slides  
17   Halloween Modeling Competition
(In Class, 75-minutes)
   
18 Read ISLR $\S$ 2.2 Bias/Variance Trade-Off and Model Performance Concerns (html, Quarto) Companion Slides  
19 Read ISLR $\S$ 5.1 (Part I, Part II, Part III) Performance Concerns Continued: Different Test, Different Expectations
Cross-Validation and Unbiased Model Performance (html, Quarto)
Companion Slides Homework 3
20   Cross-Validation Workshop    
21 Read ISLR $\S$ 6.1, 6.2 (Part IV, Part V, Part VI,
Part VII)
Variable Selection Methods:
Stepwise Regression, Ridge Regression, and the LASSO (html, Quarto)
Companion Slides  
22   Other Regressors (html, Quarto) Companion Slides Competition Assignment 6
23   Hyperparameters and Tuning
More uses for Cross-Validation (html, Quarto)
Companion Slides  
24   Hyperparameters, Tuning, and Other Regressors Workshop    
25   Thanksgiving Modeling Competition
(In Class, 75-minutes)
   
26+ Projects Projects Projects Projects




[1] DeCock, Dean (2011). Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project. Journal of Statistics Education Volume 19, Number 3(2011), http://www.amstat.org/publications/jse/v19n3/decock.pdf

[2] Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. doi:10.5281/zenodo.3960218, R package version 0.1.0, https://allisonhorst.github.io/palmerpenguins/.

[3] Pennington, Kate (2018). Bay Area Craigslist Rental Housing Posts, 2000-2018. Retrieved from https://github.com/katepennington/historic_bay_area_craigslist_housing_posts/blob/master/clean_2000_2018.csv.zip.




Back to Hompage