Skip to the content.
Statistical Learning and Classification | Welcome

MAT 434 - Statistical Learning and Classification (with R)

Fall 2023 Syllabus

Course Description: Using the foundational knowledge built in MAT 240/241 and MAT 300, we continue our study of statistical models. This course moves beyond regression and into classification models, mixed models, and unsupervised learning. Like MAT300, this course also emphasizes cross-validation as an important method for hyperparameter tuning, identifying appropriate levels of model flexibility, approximating future model performance, and analyzing the utility of a model. This course covers logistic regression, support vector machines, k nearest neighbors, tree-based methods (bagging, boosting, and random forests), and neural networks. We also cover techniques for dimension reduction and working with text-based features. In addition to the statistical modeling coursework, students will be exposed to GitHub for collaboration and version control and will use GitHub pages to build and populate a professional profile for sharing their work on the web.

Note: Looking for a Python version of this course? I’ve got one! Head over here to see it. The Python version of the course is in its first iteration and I’m very interested in feedback. You can file issues directly to the Course GitHub Repo or reach out to me directly.

Course Timeline and Notebooks

Below is a tentative timeline for our course. The table includes preparatory work that should be read prior to each class meeting, a deescription of what to expect during our class meeting, and assignments following each class meeting. I’m taking a more free-form approach to MAT434 than we took in MAT300, where I provided you with detailed notebooks prior to each class meeting. In MAT434, we’ll be building our notebooks in class, exploring several different data sets as our semester goes on. While the main topics for each class meeting are determined, you all will be dictating the direction of our analyses, the choices we make during model construction, and the corresponding discussions we end up having.

Class Meeting Dataset Before Class During Class After Class
1   i) Review Syllabus
ii) Software Setup
i) Introduction and What to Expect
ii) Ethics and Data Models
HW 1
2 FAA Airstrikes and Engine Damage, or
MLB Hits and Homeruns
(i) Ensure that git is working from RStudio
(ii) Dual-Wielding Languages with {reticulate}
i) R Projects and Version Control
ii) Tidy Analyses in R (new students or returning students)
HW 2
CA 1
3     R Markdown, inline commands, and semi-automated reporting HW 3
4     EDA and Data Viz InClass CA 2
5   Setup a username.github.io Repository GitHub Pages and a public-facing portfolio HW 4
6   {tidymodels} Framework (Review) {tidymodels} Framework Example  
7     Regression Versus Classification and Performance Metrics for Classifiers (html or rmd) HW 5
8 Spaceship Titanic Intro to Logistic Regressors (html or rmd) Binary Classifiers, Part I: Logistic Regression  
9   Intro to Support Vector Classifiers (html or rmd) Binary Classifiers, Part II: Support Vector Machines HW 6
10 Gene Expression and Cancerous Tumors Intro to Principal Component Analysis (html or rmd) Aside: High-Dimensional Data and Dimension Reduction  
11 Healthcare Analytics: Length of Stay Intro to k Nearest Neighbors (html or rmd) Multiclass Classifiers, Part I: Nearest Neighbors CA 3
12   Intro to Decision Trees (html or rmd) Multiclass Classifiers, Part II: Decision Tree Classifiers  
13   Intro to Ensembles, Bagging, and Random Forests (html or rmd) Ensembles, Part I: Bagging and Random Forests  
14   Intro to Boosting (html or rmd) Ensembles, Part II: Boosting CA 4
15   Work on GitHub Page Visit from Career Center  
16 Tweet Emotion Intro to Text and Tokenization (html or rmd) Text Features, Part I: Tokenization  
17   Intro to Regular Expressions (html or rmd) Text Features, Part II: Regex  
18   Intro to Word Embeddings (html or rmd) Text Features, Part III: Embeddings CA 5
19 Monster Classification   Halloween Classification Challenge
(InClass Kaggle Competition)
 
20 Fashion MNIST Install TensorFlow Deep Learning, Part I: Architecture  
21     Deep Learning, Part II: Activation Functions  
22     Deep Learning, Part III: Training and Assessment CA 6
23     Final Project Topic Discussion  
24     Final Project Group Selection  
25 Turkey Pardoning   Thanksgiving Classification Challenge
(InClass Kaggle Competition)
 
26+     Final Projects  






Back to Hompage