<- linear_reg() %>%
lr_spec set_engine("lm")
<- recipe(y ~ x, data = toydata)
lr_rec
<- workflow() %>%
lr_wf add_model(lr_spec) %>%
add_recipe(lr_rec)
<- lr_wf %>%
lr_fit fit(toydata)
<- lr_fit %>%
toydata augment(toydata)
Homework 2: Understanding Assumptions
Description and Purpose: In this homework assignment, you’ll build your own toy data set. You’ll fit a linear regressor to this toy data set using code provided to you. You’ll compute and plot residuals for your model and interpret the residual plots to identify shortcomings for your model. Understanding how your model performs and the types of errors it makes are important aspects of using that model responsibly.
Your Tasks
- Build a toy dataset by completing the following:
Choose 50
x
values randomly and uniformly over the interval \(\left[0, 100\right]\). The functionrunif()
will be useful here.Construct a collection of 50 values
y
, which are associated withx
via at least a quadratic relationship. Your error term should be normally distributed with mean \(0\) and a constant standard deviation. The functionrnorm()
will be useful here.Use the command
toydata <- tibble(x = x, y = y)
to package your data into a data frame.Use
ggplot()
to plot the association betweenx
andy
.- Smaller standard deviations will make your regression task “easier”, while larger standard deviations will make it more “difficult”. You may want to play around with a variety of standard deviation values until you get a data set that you like. Ideally, you’ll build a data set with a clear general trend but enough noise that your prediction errors will be noticeable.
- Use the code cell below to fit a model to your data and attach predictions to your
toydata
set. The code utilized should be familiar from our introduction to{tidymodels}
notebook. You’ll want to delete#| eval: false
option from the code chunk to ensure that this code is run when you knit your notebook.
Use
ggplot()
to plot your originalx
andy
pairs from yourtoydata
set in “black” and add another layer to plot your model’s predictions in a different color. What do you notice?Use
mutate()
to add another new column to yourtoydata
set. That column should contain your model’s residuals – the true observed reponses minus your model’s predictions.Use
ggplot()
to plot your model’s residuals against the true responses (y
). What do you notice?Use
ggplot()
to plot your model’s residuals against the value of your predictor variable (x
). What do you notice?Run the code below to obtain the Root Mean Squared Error for your model, and describe what it means. Again, you’ll want to remove
#| eval: false
chunk option to ensure the code runs when you knit your document.
%>%
toydata rmse(y, .pred)
Use your response from part (7) and your plots from parts (5) and (6) to discuss a severe shortcoming of this model. What assumptions are not being satisfied and how does that impact the confidence intervals we build for model predictions?
(Optional) Use the
predict()
function with thetype = "conf_int"
argument to generate lower and upper bounds on a confidence interval for model predictions. Build a single plot which includes your originalx
andy
values from thetoydata
set, your model’s predictions in a different color, and the lower and upper bounds for a confidence interval for your model’s predictions. Use this plot as supporting evidence for your conclusion in part (8).When you are done, knit your notebook to an HTML document and submit both your HTML output and QMD file to BrightSpace using the Homework 2 submission folder.