September 30, 2024
Are our models useful?
Confidence Intervals for Coefficients
Intervals for Model Predictions
Open your notebook from last time
As we discuss the different hypothesis test and interval analyses we’ll be encountering in the regression context, analyze the corresponding items for your models in that notebook
\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k + \varepsilon\\ ~~~~\text{or}~~~~\\ \mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]
Does our model contain any useful information about predicting/explaining our response variable at all?
Hypotheses:
\[\begin{array}{lcl} H_0 & : & \beta_1 = \beta_2 = \cdots = \beta_k = 0\\ H_a & : & \text{At least one } \beta_i \text{ is non-zero}\end{array}\]
\[\begin{array}{lcl} H_0 & : & \beta_1 = \beta_2 = \cdots = \beta_k = 0\\ H_a & : & \text{At least one } \beta_i \text{ is non-zero}\end{array}\]
Are our sloped models better (more justifiable) models than the horizontal line?
Sloped models use predictor information
Horizontal models just predict average response, ignoring all observation-specific features
\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k + \varepsilon\\ ~~~~\text{or}~~~~\\ \mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]
Okay, so our model has some utility. Do we really need all of those terms?
Hypotheses:
\[\begin{array}{lcl} H_0 & : & \beta_i = 0\\ H_a & : & \beta_i \neq 0\end{array}\]
Reminder: An approximate 95% confidence interval is between two standard errors below and above our point estimate.
\[\left(\text{point estimate}\right) \pm 2\cdot\left(\text{standard error}\right)~~~\textbf{or}~~~\left(\text{point estimate}\right) \pm t^*_{\text{df}}\cdot\left(\text{standard error}\right)\]
They’re all wrong!
The formula for confidence intervals on predictions is complex!
\[\displaystyle{\left(\tt{point~estimate}\right)\pm t^*_{\text{df}}\cdot \left(\tt{RMSE}\right)\left(\sqrt{\frac{1}{n} + \frac{(x_{new} - \bar{x})^2}{\sum{\left(x - \bar{x}\right)^2}}}\right)}\]
We’ll use R to construct these intervals for us.
Are these wrong too?
The formula for confidence intervals on predictions is complex!
\[\displaystyle{\left(\tt{point~estimate}\right)\pm t^*_{\text{df}}\cdot \left(\tt{RMSE}\right)\left(\sqrt{\frac{1}{n} + \frac{(x_{new} - \bar{x})^2}{\sum{\left(x - \bar{x}\right)^2}}}\right)}\]
We’ll use R to construct these intervals for us.
Are these wrong too? No – confidence intervals bound the average response over all observations having given input features.
So, can we build intervals which contain predictions on the level of an individual observation?
Sure – but there’s added uncertainty in making those types of predictions
Below are the most common applications of statistical inference in regression modeling.
Hypothesis Tests
Confidence Intervals
What is the plausible range for each parameter/coefficient?
Can we make reliable predictions?
We’ll be utilizing all of these ideas throughout our course.
We’ll leverage R functionality to obtain intervals or to calculate test statistics and \(p\)-values though, since it is much faster than doing any of this by hand.
Hypothesizing, Constructing, Assessing, and Interpreting Simple Linear Regression Models