Introduction to Regularization: Ridge Regression and the LASSO

Dr. Gilbert

November 11, 2024

Reminders from Last Time

Cross-validation is a procedure that we can use to obtain more reliable performance assessments for our models

In \(k\)-fold cross validation, we create \(k\) models and obtain \(k\) performance estimates – the average of these performance estimates can be referred to as the cross-validation performance estimate

Our cross-validation procedure does NOT result in a fitted model, but results in the cross-validation performance estimate and an estimated standard error for that model performance

Cross-validation makes our choices and inferences less susceptible to random chance (the randomly chosen training and test observations)

Big Picture Recap

Our approach to linear regression so far has perhaps led us to the intuition that we should start with a large model and then reduce it down to include only statistically significant terms

This approach, called backward elimination, is commonly utilized

There is also an opposite approach, called forward selection

Playing Along

We’ll switch to using the ames dataset for this discussion

That dataset contains features and selling prices for 2,930 homes sold in Ames, Iowa between 2006 and 2010

  1. Open your MAT300 project in RStudio and create a new Quarto document

  2. Use a setup chunk to load the {tidyverse} and {tidymodels}

  3. The ames data set is contained in the {modeldata} package, which is loaded with {tidymodels} – take a preliminary look at this dataset

  4. Split your data into training and test sets

  5. Create five or ten cross-validation folds

A Shopping Analogy

Consider the model (or us, as modelers) as a shopper in a market that sells predictors

  • (Backward elimination) Our model begins by putting every item in the store into its shopping cart, and then puts back the items it doesn’t need

  • (Forward selection) Our model begins with an empty cart and wanders the store, finding the items it needs most to add to its cart one-by-one

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

  • The probability of making at least one Type I error (claiming a term is significant when it truly isn’t) becomes inflated – even with just three tests, the probability is over 14% when using the common \(\alpha = 0.05\)

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

  • The probability of making at least one Type I error (claiming a term is significant when it truly isn’t) becomes inflated – even with just three tests, the probability is over 14% when using the common \(\alpha = 0.05\)

Model-Fit Perspective: The more predictors a model has access to, the more flexible it is, the better it will fit the training data, and the more likely it is to become overfit

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

  • The probability of making at least one Type I error (claiming a term is significant when it truly isn’t) becomes inflated – even with just three tests, the probability is over 14% when using the common \(\alpha = 0.05\)

Model-Fit Perspective: The more predictors a model has access to, the more flexible it is, the better it will fit the training data, and the more likely it is to become overfit

Back to the Shopping Analogy

By allowing a model to “shop” freely for its predictors, we are encouraging our model to become overfit

Giving our model a “budget” to spend on its shopping trip would force our model to be more selective about the predictors it chooses, and lowers the likelihood that it becomes overfit

A Look Under the Hood

We’ve hidden the math that fits our models up until this point, but its worth a look now

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

The optimization procedure we’ve been using to find the \(\beta\)-coefficients is called Ordinary Least Squares

Ordinary Least Squares: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - y_{\text{pred}_i}\right)^2}\]

A Look Under the Hood

We’ve hidden the math that fits our models up until this point, but its worth a look now

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

The optimization procedure we’ve been using to find the \(\beta\)-coefficients is called Ordinary Least Squares

Ordinary Least Squares: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik}\right)\right)^2}\]

A Look Under the Hood

We’ve hidden the math that fits our models up until this point, but its worth a look now

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

The optimization procedure we’ve been using to find the \(\beta\)-coefficients is called Ordinary Least Squares

Ordinary Least Squares: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \sum_{j = 1}^{k}{\beta_j x_{ij}}\right)\right)^2}\]

This is the procedure that allows our model to shop freely for predictors

Regularization

Regularization refers to techniques design to constrain models and reduce the likelihood of overfitting

For linear regression, there are two commonly used methods

  • Ridge Regression
  • The LASSO (least absolute shrinkage and selection operator)

Each of these methods makes an adjustment to the Ordinary Least Squares procedure we just saw

Regularization: Ridge Regression

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

Ridge Regression: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \sum_{j = 1}^{k}{\beta_j x_{ij}}\right)\right)^2}\]

subject to the constraint

\[\sum_{j = 1}^{k}{\beta_j^2} \leq C\]

Note: \(C\) is a constraint which can be thought of as our budget for coefficients

The Result: Ridge regression encourages very small coefficients on unimportant predictors

Regularization: The LASSO

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

LASSO: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \sum_{j = 1}^{k}{\beta_j x_{ij}}\right)\right)^2}\]

subject to the constraint

\[\sum_{j = 1}^{k}{\left|\beta_j\right|} \leq C\]

Note: Like with Ridge Regression, \(C\) is a constraint which can be thought of as our budget for coefficients

The Result: The LASSO pushes coefficients of unimportant predictors to \(0\)

LASSO or Ridge?

The choice between Ridge Regression and the LASSO depends on your goals

The LASSO is better at variable selection because it sends coefficients on unimportant predictors to exactly \(0\)

LASSO won’t always out-perform Ridge though, so you might try both and see which is better for your use case

Feature PreProcessing Requirements

If a predictor \(x_i\) is on a larger scale than the response \(y\), then that predictor becomes artificially cheap to include in a model

Similarly, if a predictor \(x_j\) is on a smaller scale than the response, then that predictor becomes artificially expensive to include in a model

We don’t want any of our predictors to be artificially advantaged or disadvantaged, so we must ensure that all of our numerical predictors are on the same scale as one another

Min-Max Scaling: Projects each numerical predictor down to the interval \(\left[0, 1\right]\) via \(\displaystyle{\frac{x - \min\left(x\right)}{\max\left(x\right) - \min\left(x\right)}}\)

  • We include step_range() in a recipe to utilize min-max scaling

Standard Scaling: Converts observed measurements into standard deviations (\(z\)-scores) via \(\displaystyle{\frac{x - \text{mean}\left(x\right)}{\text{sd}\left(x\right)}}\)

  • We include step_normalize() in a recipe to utilize standard scaling

Switching Engines

The {tidymodels} framework is great because it provides us with standardized structure for defining and fitting models

Ridge and the LASSO are still linear regression models, but they’re no longer fit using OLS

This puts them in a class of models called Generalized Linear Models . . .

We’ll need to change our fitting engine from "lm" to something that can fit these GLMs – we’ll use "glmnet"

Required Parameters for "glmnet":

  • mixture can be set to any value between \(0\) and \(1\)

    • Setting mixture = 0 results in Ridge Regression
    • Setting mixture = 1 results in the LASSO
  • penalty is the amount of regularization being applied

    • You can think of this parameter as being tied to our coefficient budget

Choosing a penalty

For now, we’ll just pick a value and see how it performs

We can experiment with several if we like, and then choose the one that results in the best performance

We’ll talk about a better strategy next time

A Few Additional Concerns

  • The "glmnet" engine requires that no missing values are included in any fold

    • We could omit any rows with missing values
    • We could omit any features with missing entries
    • We could impute missing values with a step_impute_*() function added to a recipe

Implementing Ridge and LASSO with the Ames Data

We’ll use the ames housing data set that we’ve seen from time to time this semester

We’ll read it in, remove the rows with missing Sale_Price (our response), split the data into training and test sets, and create our cross-validation folds

ames_known_price <- ames %>%
  filter(!is.na(Sale_Price))

set.seed(123)

ames_split <- initial_split(ames_known_price, prop = 0.9)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)

ames_folds <- vfold_cv(ames_train, v = 5)

Ridge Regression

ridge_reg_spec <- linear_reg(mixture = 0, penalty = 1e4) %>%
  set_engine("glmnet")

ridge_reg_rec <- recipe(Sale_Price ~ ., data = ames_train) %>%
  step_impute_knn(all_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_other(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors())

ridge_reg_wf <- workflow() %>%
  add_model(ridge_reg_spec) %>%
  add_recipe(ridge_reg_rec)

ridge_reg_results <- ridge_reg_wf %>%
  fit_resamples(ames_folds)

ridge_reg_results %>%
  collect_metrics()
.metric .estimator mean n std_err .config
rmse standard 33512.4783290 5 3528.8858182 Preprocessor1_Model1
rsq standard 0.8285515 5 0.0232516 Preprocessor1_Model1

Seeing the Estimated Ridge Regression Model

While we wouldn’t generally fit the Ridge Regression model at this time, you can see how to do that and examine the estimated model below.

ridge_reg_fit <- ridge_reg_wf %>%
  fit(ames_train)

ridge_reg_fit %>%
  tidy()
term estimate penalty
(Intercept) 170590.91440 10000
Lot_Frontage 1458.08999 10000
Lot_Area 1919.73420 10000
Year_Built 4443.44715 10000
Year_Remod_Add 5652.52473 10000
Mas_Vnr_Area 6469.12318 10000
BsmtFin_SF_1 -109.24172 10000
BsmtFin_SF_2 619.57697 10000
Bsmt_Unf_SF -1897.35902 10000
Total_Bsmt_SF 7471.51817 10000
First_Flr_SF 8231.08414 10000
Second_Flr_SF 8576.05018 10000
Gr_Liv_Area 13259.91733 10000
Bsmt_Full_Bath 2910.80533 10000
Bsmt_Half_Bath -872.28855 10000
Full_Bath 4132.02383 10000
Half_Bath 2520.03561 10000
Bedroom_AbvGr -2600.41177 10000
Kitchen_AbvGr -3311.98370 10000
TotRms_AbvGrd 4547.30499 10000
Fireplaces 4625.12646 10000
Garage_Cars 5750.22932 10000
Garage_Area 4248.50579 10000
Wood_Deck_SF 1899.00078 10000
Open_Porch_SF 224.51413 10000
Enclosed_Porch 1142.26429 10000
Three_season_porch 487.29049 10000
Screen_Porch 3569.00946 10000
Pool_Area -1114.39753 10000
Misc_Val -5085.90458 10000
Mo_Sold -203.12359 10000
Year_Sold -835.49586 10000
Longitude 650.40625 10000
Latitude 5328.30675 10000
MS_SubClass_One_Story_1945_and_Older -4126.27103 10000
MS_SubClass_One_and_Half_Story_Finished_All_Ages -1861.67028 10000
MS_SubClass_Two_Story_1946_and_Newer -1514.91540 10000
MS_SubClass_One_Story_PUD_1946_and_Newer -4117.65295 10000
MS_SubClass_other -1733.26832 10000
MS_Zoning_Residential_Medium_Density -5316.31128 10000
MS_Zoning_other -5176.60141 10000
Street_other -13837.22636 10000
Alley_other -1461.83621 10000
Lot_Shape_Slightly_Irregular 3700.17083 10000
Lot_Shape_other 961.05230 10000
Land_Contour_other 3230.49668 10000
Utilities_other -8536.41564 10000
Lot_Config_CulDSac 11616.63150 10000
Lot_Config_Inside 357.65609 10000
Lot_Config_other -5147.76184 10000
Land_Slope_other 3944.23456 10000
Neighborhood_College_Creek 1894.52790 10000
Neighborhood_Old_Town -1468.07869 10000
Neighborhood_Edwards -7179.02156 10000
Neighborhood_Somerset 12865.77564 10000
Neighborhood_Northridge_Heights 35341.47415 10000
Neighborhood_Gilbert -19167.19093 10000
Neighborhood_other 8049.59728 10000
Condition_1_Norm 10577.33452 10000
Condition_1_other 2555.17042 10000
Condition_2_other 7230.83188 10000
Bldg_Type_TwnhsE -11776.64761 10000
Bldg_Type_other -15016.92568 10000
House_Style_One_Story 2517.46869 10000
House_Style_Two_Story -1834.73368 10000
House_Style_other -3583.23800 10000
Overall_Cond_Above_Average 387.29301 10000
Overall_Cond_Good 5453.63364 10000
Overall_Cond_other -198.67916 10000
Roof_Style_Hip 9826.95184 10000
Roof_Style_other -6285.85166 10000
Roof_Matl_other 2956.86795 10000
Exterior_1st_MetalSd 2401.59277 10000
Exterior_1st_Plywood -423.31025 10000
Exterior_1st_VinylSd 430.87774 10000
Exterior_1st_Wd.Sdng 364.15513 10000
Exterior_1st_other 9216.05135 10000
Exterior_2nd_MetalSd 2810.59106 10000
Exterior_2nd_Plywood -4782.09685 10000
Exterior_2nd_VinylSd 1979.06774 10000
Exterior_2nd_Wd.Sdng 3586.14793 10000
Exterior_2nd_other 1975.61474 10000
Mas_Vnr_Type_None 7067.96660 10000
Mas_Vnr_Type_Stone 6432.96754 10000
Mas_Vnr_Type_other -12999.85758 10000
Exter_Cond_Typical -1223.58374 10000
Exter_Cond_other -8374.06611 10000
Foundation_CBlock -3752.43518 10000
Foundation_PConc 4777.91201 10000
Foundation_other 12.67181 10000
Bsmt_Cond_other -1096.68254 10000
Bsmt_Exposure_Gd 18454.01931 10000
Bsmt_Exposure_Mn -6292.37350 10000
Bsmt_Exposure_No -9993.27851 10000
Bsmt_Exposure_other -3349.21932 10000
BsmtFin_Type_1_BLQ -979.66229 10000
BsmtFin_Type_1_GLQ 9866.03456 10000
BsmtFin_Type_1_LwQ -4371.15616 10000
BsmtFin_Type_1_Rec -2205.86621 10000
BsmtFin_Type_1_Unf -2137.50633 10000
BsmtFin_Type_1_other -54.65570 10000
BsmtFin_Type_2_other -1995.42781 10000
Heating_other -1333.94062 10000
Heating_QC_Good -4130.03350 10000
Heating_QC_Typical -7280.75514 10000
Heating_QC_other -10421.28767 10000
Central_Air_Y 305.63110 10000
Electrical_SBrkr -843.81162 10000
Electrical_other 145.95150 10000
Functional_other -16837.88549 10000
Garage_Type_BuiltIn 2912.19245 10000
Garage_Type_Detchd -1782.91691 10000
Garage_Type_No_Garage 1405.71932 10000
Garage_Type_other -13424.85604 10000
Garage_Finish_No_Garage 2074.34345 10000
Garage_Finish_RFn -7304.15017 10000
Garage_Finish_Unf -3707.86157 10000
Garage_Cond_Typical 587.45190 10000
Garage_Cond_other -4325.25632 10000
Paved_Drive_Paved 1759.89335 10000
Paved_Drive_other 3091.17128 10000
Pool_QC_other 12835.93973 10000
Fence_No_Fence -1541.73821 10000
Fence_other -1034.11743 10000
Misc_Feature_other 6558.20291 10000
Sale_Type_WD. -6374.52035 10000
Sale_Type_other -7784.24102 10000
Sale_Condition_Normal 6003.00979 10000
Sale_Condition_Partial 13836.66751 10000
Sale_Condition_other 3630.29900 10000

The LASSO

lasso_reg_spec <- linear_reg(mixture = 1, penalty = 1e4) %>%
  set_engine("glmnet")

lasso_reg_rec <- recipe(Sale_Price ~ ., data = ames_train) %>%
  step_impute_knn(all_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_other(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors())

lasso_reg_wf <- workflow() %>%
  add_model(lasso_reg_spec) %>%
  add_recipe(lasso_reg_rec)

lasso_reg_results <- lasso_reg_wf %>%
  fit_resamples(ames_folds)

lasso_reg_results %>%
  collect_metrics()
.metric .estimator mean n std_err .config
rmse standard 40537.9245852 5 3303.1924456 Preprocessor1_Model1
rsq standard 0.7821723 5 0.0229465 Preprocessor1_Model1

Seeing the Estimated LASSO Model

Again, we wouldn’t generally fit the LASSO model at this time, however you can see how to do that and examine the estimated model below.

lasso_reg_fit <- lasso_reg_wf %>%
  fit(ames_train)

lasso_reg_fit %>%
  tidy()
term estimate penalty
(Intercept) 174537.449 10000
Lot_Frontage 0.000 10000
Lot_Area 0.000 10000
Year_Built 7478.097 10000
Year_Remod_Add 6606.480 10000
Mas_Vnr_Area 3398.018 10000
BsmtFin_SF_1 0.000 10000
BsmtFin_SF_2 0.000 10000
Bsmt_Unf_SF 0.000 10000
Total_Bsmt_SF 12271.407 10000
First_Flr_SF 0.000 10000
Second_Flr_SF 0.000 10000
Gr_Liv_Area 26200.798 10000
Bsmt_Full_Bath 0.000 10000
Bsmt_Half_Bath 0.000 10000
Full_Bath 0.000 10000
Half_Bath 0.000 10000
Bedroom_AbvGr 0.000 10000
Kitchen_AbvGr 0.000 10000
TotRms_AbvGrd 0.000 10000
Fireplaces 3226.727 10000
Garage_Cars 6992.504 10000
Garage_Area 4368.112 10000
Wood_Deck_SF 0.000 10000
Open_Porch_SF 0.000 10000
Enclosed_Porch 0.000 10000
Three_season_porch 0.000 10000
Screen_Porch 0.000 10000
Pool_Area 0.000 10000
Misc_Val 0.000 10000
Mo_Sold 0.000 10000
Year_Sold 0.000 10000
Longitude 0.000 10000
Latitude 0.000 10000
MS_SubClass_One_Story_1945_and_Older 0.000 10000
MS_SubClass_One_and_Half_Story_Finished_All_Ages 0.000 10000
MS_SubClass_Two_Story_1946_and_Newer 0.000 10000
MS_SubClass_One_Story_PUD_1946_and_Newer 0.000 10000
MS_SubClass_other 0.000 10000
MS_Zoning_Residential_Medium_Density 0.000 10000
MS_Zoning_other 0.000 10000
Street_other 0.000 10000
Alley_other 0.000 10000
Lot_Shape_Slightly_Irregular 0.000 10000
Lot_Shape_other 0.000 10000
Land_Contour_other 0.000 10000
Utilities_other 0.000 10000
Lot_Config_CulDSac 0.000 10000
Lot_Config_Inside 0.000 10000
Lot_Config_other 0.000 10000
Land_Slope_other 0.000 10000
Neighborhood_College_Creek 0.000 10000
Neighborhood_Old_Town 0.000 10000
Neighborhood_Edwards 0.000 10000
Neighborhood_Somerset 0.000 10000
Neighborhood_Northridge_Heights 22598.017 10000
Neighborhood_Gilbert 0.000 10000
Neighborhood_other 0.000 10000
Condition_1_Norm 0.000 10000
Condition_1_other 0.000 10000
Condition_2_other 0.000 10000
Bldg_Type_TwnhsE 0.000 10000
Bldg_Type_other 0.000 10000
House_Style_One_Story 0.000 10000
House_Style_Two_Story 0.000 10000
House_Style_other 0.000 10000
Overall_Cond_Above_Average 0.000 10000
Overall_Cond_Good 0.000 10000
Overall_Cond_other 0.000 10000
Roof_Style_Hip 0.000 10000
Roof_Style_other 0.000 10000
Roof_Matl_other 0.000 10000
Exterior_1st_MetalSd 0.000 10000
Exterior_1st_Plywood 0.000 10000
Exterior_1st_VinylSd 0.000 10000
Exterior_1st_Wd.Sdng 0.000 10000
Exterior_1st_other 0.000 10000
Exterior_2nd_MetalSd 0.000 10000
Exterior_2nd_Plywood 0.000 10000
Exterior_2nd_VinylSd 0.000 10000
Exterior_2nd_Wd.Sdng 0.000 10000
Exterior_2nd_other 0.000 10000
Mas_Vnr_Type_None 0.000 10000
Mas_Vnr_Type_Stone 0.000 10000
Mas_Vnr_Type_other 0.000 10000
Exter_Cond_Typical 0.000 10000
Exter_Cond_other 0.000 10000
Foundation_CBlock 0.000 10000
Foundation_PConc 3534.049 10000
Foundation_other 0.000 10000
Bsmt_Cond_other 0.000 10000
Bsmt_Exposure_Gd 10297.300 10000
Bsmt_Exposure_Mn 0.000 10000
Bsmt_Exposure_No 0.000 10000
Bsmt_Exposure_other 0.000 10000
BsmtFin_Type_1_BLQ 0.000 10000
BsmtFin_Type_1_GLQ 7603.684 10000
BsmtFin_Type_1_LwQ 0.000 10000
BsmtFin_Type_1_Rec 0.000 10000
BsmtFin_Type_1_Unf 0.000 10000
BsmtFin_Type_1_other 0.000 10000
BsmtFin_Type_2_other 0.000 10000
Heating_other 0.000 10000
Heating_QC_Good 0.000 10000
Heating_QC_Typical 0.000 10000
Heating_QC_other 0.000 10000
Central_Air_Y 0.000 10000
Electrical_SBrkr 0.000 10000
Electrical_other 0.000 10000
Functional_other 0.000 10000
Garage_Type_BuiltIn 0.000 10000
Garage_Type_Detchd 0.000 10000
Garage_Type_No_Garage 0.000 10000
Garage_Type_other 0.000 10000
Garage_Finish_No_Garage 0.000 10000
Garage_Finish_RFn 0.000 10000
Garage_Finish_Unf 0.000 10000
Garage_Cond_Typical 0.000 10000
Garage_Cond_other 0.000 10000
Paved_Drive_Paved 0.000 10000
Paved_Drive_other 0.000 10000
Pool_QC_other 0.000 10000
Fence_No_Fence 0.000 10000
Fence_other 0.000 10000
Misc_Feature_other 0.000 10000
Sale_Type_WD. 0.000 10000
Sale_Type_other 0.000 10000
Sale_Condition_Normal 0.000 10000
Sale_Condition_Partial 0.000 10000
Sale_Condition_other 0.000 10000

Non-zero LASSO Coefficients

Here are only the predictors with non-zero coefficients

lasso_reg_fit %>%
  tidy() %>%
  filter(estimate != 0)
term estimate penalty
(Intercept) 174537.449 10000
Year_Built 7478.097 10000
Year_Remod_Add 6606.480 10000
Mas_Vnr_Area 3398.018 10000
Total_Bsmt_SF 12271.407 10000
Gr_Liv_Area 26200.798 10000
Fireplaces 3226.727 10000
Garage_Cars 6992.504 10000
Garage_Area 4368.112 10000
Neighborhood_Northridge_Heights 22598.017 10000
Foundation_PConc 3534.049 10000
Bsmt_Exposure_Gd 10297.300 10000
BsmtFin_Type_1_GLQ 7603.684 10000

Summary

  • The more predictors we include in a model, the more flexible that model is

  • We can use regularization methods to constrain our models and make overfitting less likely

  • Two techniques commonly used with linear regression models are Ridge Regression and the LASSO

  • These methods alter the optimization problem that obtains the estimated \(\beta\)-coefficients for our model

  • Ridge Regression attaches very small coefficients to uninformative predictors, while the LASSO attaches coefficients of \(0\) to them

    • This means that the LASSO can be used for variable selection
  • Both Ridge Regression and the LASSO require all numerical predictors to be scaled

  • We can fit/cross-validate these models in nearly the same way that we have been working with ordinary linear regression models

    • We set_engine("glmnet") rather than set_engine("lm") for ridge and LASSO
    • We set mixture = 0 for Ridge Regression and mixture = 1 for the LASSO
    • We define a penalty parameter which determines the amount of regularization (constraint) applied to the model

Next Time…


Other Classes of Regression Model