Competition Assignment 2

Author

Me, Scientist

Published

January 6, 2026

In your first competition assignment, you joined our In-Class Kaggle Competition, downloaded the data for the competition, read it into an Quarto Notebook, and wrote a first draft of a statement of purpose for an analytics project. You’ll add to that work here.

  1. Re-open the Quarto Notebook that contains your Statement of Purpose from Competition Assignment 1.

  2. Re-run the code necessary to read in your data files.

  3. Use the following code to split the data coming from data_train.csv into three sets: train, validation and test.

#Load {tidymodels}
#Put this line under library(tidyverse)
library(tidymodels)

# Set a seed for reproducibility
# Change the number if you'd like (recommended)
set.seed(123)

#Mark your rows as belonging to training or test (temporary) sets
#Change "data" to the name of your data frame.
#Change the proportion if you'd like.
#The hold out sets need enough rows for you to 
#test/assess models on, but any observations there
#are observations you don't get to "learn" from.
data_splits <- initial_split(data, prop = 0.75)

#Extract the training observations
train <- training(data_splits)
#We'll split these observations again
temp <- testing(data_splits)

#Again, change the seed if you like 
set.seed(456)

#Split the hold-out observations into validation 
#and test sets
temp_splits <- initial_split(temp, prop = 0.5)
validation <- training(temp_splits)
test <- testing(temp_splits)
Note

There’s a better way to achieve this three-set split using initial_validation_split(), but something better is coming later and we’ll only have need for initial_split().

  1. Remember that the validation and test sets should stay hidden until later parts of the analytics project. Conduct an exploratory analysis on the training data (train).
  1. Once you are done, render your Quarto document to HTML and submit both your Quarto and HTML file using the Competition Assignment 2 folder in BrightSpace. As a reminder, your submission should look like a partial report, including only the Statement of Purpose and Exploratory Data Analysis sections. Your report will mix text and code like you’ve seen, and built, in our class notebooks. All of your code should come with context. Be sure to answer the questions “What do the outputs mean and why do we care?”.
As always, reach out on Slack with questions.
– Dr. G