Complete the following:
Open RStudio and use File -> Recent Projects
to
select and open the R Project which is managing your GitHub
repository.
Use File -> Open File...
and navigate to the
location of your R Markdown notebook from Competition Assignment 1. Open
it.
Run all of the code in your existing notebook.
Use what you know about the competition and the data set to write an appropriate Statement of Purpose for the analysis and modeling you’ll be doing throughout this competition.
In your first Python code chunk, add the necessary code to import
train_test_split
from
sklearn.model_selection
.
Open a new code cell and split your data into training and test
sets. Don’t forget to set a seed. Think about whether you should use the
strata
argument for train_test_split()
and use
it if you believe you should.
Conduct an exploratory data analysis using your training data to learn more about your data set. One of your major goals in this section of your analytics report should be to understand which of your available features (or engineered features) are most closely associated with your response variable.
When you are done, use the blue ball of yarn button to knit the notebook into an HTML document.
Use the Git
tab in the top right pane of RStudio to
Pull, Commit, Push your new files to your remote repository at
GitHub.
Stop by my office if you have any questions or need help.