<- slr_fit %>% #use your model name
slr_preds augment(comp) %>%
select(ID, .pred) %>%
rename(response_name = .pred) #Replace response_name
Competition Assignment 3
In your first two competition assignments, you got familiar with our competition and the related data. You wrote a Statement of Purpose and began an exploratory analysis to find associations between your available predictors and response. You’ll add to that work here by building your first linear regression model and submitting your first set of predictions to Kaggle for scoring.
Re-open the Quarto Notebook that contains your Statement of Purpose and Exploratory Data Analysis from Competition Assignments 1 and 2.
Re-run all of the code in your
setup
chunk as well as your code from the previous two competition assignments.Add a new section to your analytics report called Model Construction. In it, discuss the simple linear regression model you are choosing to build (and why).
Build a simple linear regression model using what you’ve learned about the
{tidymodels}
framework. Be sure to use the single predictor that you think is most important in predicting the time to deliver a Door Dash order.Assess your model’s performance using what you’ve learned about model assessment so far. You should use the
validation
data for this purpose, but please do not open thetest
data yet – we will be building more models and don’t want any knowledge of thetest
data to infect our modeling decisions.Return to the
setup
chunk and load thedata_comp.csv
data set into your Notebook. Be sure to store this data into a named variable so that you can access it later.
- This dataset contains new observations but is missing the response column. Your goal in the competition is to predict those missing responses – Kaggle will keep us informed about how we are doing.
- Use your model to
augment()
the competition data with predictions. Be sure to store the resulting data frame in a named variable – you may want to use a new variable name so that you don’t overwrite your competition data altogether. Additionally, the only columns you’ll need now are theID
andelapsed_time
columns. Remember that you can userename()
to rename the.pred
column to match the name of the response column that Kaggle is expecting. The chain of commands below may be helpful.
- Now we’ll need to extract our predictions from R so that we can submit them to Kaggle for scoring. We can do this with the
write.csv()
function. That function takes the data frame that we would like to write out as a csv file, the path and filename we’d like to save the csv file to on our computer, and we setrow.names = FALSE
to prevent R from adding an extra column of row numbers to the csv file. Below is what that command would look like on my computer, if I wanted to save my model’s predictions into the main project directory for MAT300. If you want to save it somewhere else, you’ll need to provide the file path relative to the location of your QMD file.
write.csv(slr_preds,
"slr_preds.csv",
row.names = FALSE)
Navigate to our competition site on Kaggle and use the Submission button to make your first submission for scoring on Kaggle. You’ll upload your csv file and then see where you fall on the leaderboard. This public leaderboard is scored using only 30% of the listings in our competition data set. The final, private leaderboard will be scored on the remaining 70% of those hidden listings – that is, your position on the leaderboard may change slightly when the competition closes on December 17. Note: Your position on the leaderboard is in no way related to your score on these assignments, but I hope that you are motivated to work your way towards the top of the leaderboard as our semester progresses.
Render your notebook and submit the HTML and qmd files as usual on BrightSpace using the Competition Assignment 3 folder.
– Dr. G