Competition Assignment 2

Author

Me, Scientist

Published

September 17, 2024

In your first competition assignment, you joined our In-Class Kaggle Competition, downloaded the data for the competition, read it into an Quarto Notebook, and wrote a first draft of a statement of purpose for an analytics project. You’ll add to that work here.

  1. Re-open the Quarto Notebook that contains your Statement of Purpose from Competition Assignment 1.

  2. Re-run the code necessary to read in your data files.

  3. Write code to split the data coming from data_train.csv into three sets: train, validation and test. You’ll need two separate calls to initial_split() for this. Be sure to set a seed, with set.seed() before each call.

  1. Remember that the validation and test sets should stay hidden until later parts of the analytics project. Conduct an exploratory analysis on the training data (train).
  1. Once you are done, render your Quarto document to HTML and submit both your Quarto and HTML file using the Competition Assignment 2 folder in BrightSpace. As a reminder, your submission should look like a partial report, including only the Statement of Purpose and Exploratory Data Analysis sections. Your report will mix text and code like you’ve seen, and built, in our class notebooks. All of your code should come with context. Be sure to answer the questions “What do the outputs mean and why do we care?”.
As always, reach out on Slack with questions.
– Dr. G