Introduction to Regularization: Ridge Regression and the LASSO

Dr. Gilbert

November 11, 2024

Reminders from Last Time

Cross-validation is a procedure that we can use to obtain more reliable performance assessments for our models

In \(k\)-fold cross validation, we create \(k\) models and obtain \(k\) performance estimates – the average of these performance estimates can be referred to as the cross-validation performance estimate

Our cross-validation procedure does NOT result in a fitted model, but results in the cross-validation performance estimate and an estimated standard error for that model performance

Cross-validation makes our choices and inferences less susceptible to random chance (the randomly chosen training and test observations)

Big Picture Recap

Our approach to linear regression so far has perhaps led us to the intuition that we should start with a large model and then reduce it down to include only statistically significant terms

This approach, called backward elimination, is commonly utilized

There is also an opposite approach, called forward selection

Playing Along

We’ll switch to using the ames dataset for this discussion

That dataset contains features and selling prices for 2,930 homes sold in Ames, Iowa between 2006 and 2010

Open your MAT300 project in RStudio and create a new Quarto document
Use a setup chunk to load the {tidyverse} and {tidymodels}
The ames data set is contained in the {modeldata} package, which is loaded with {tidymodels} – take a preliminary look at this dataset
Split your data into training and test sets
Create five or ten cross-validation folds

A Shopping Analogy

Consider the model (or us, as modelers) as a shopper in a market that sells predictors

(Backward elimination) Our model begins by putting every item in the store into its shopping cart, and then puts back the items it doesn’t need
(Forward selection) Our model begins with an empty cart and wanders the store, finding the items it needs most to add to its cart one-by-one

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

The probability of making at least one Type I error (claiming a term is significant when it truly isn’t) becomes inflated – even with just three tests, the probability is over 14% when using the common \(\alpha = 0.05\)

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

The probability of making at least one Type I error (claiming a term is significant when it truly isn’t) becomes inflated – even with just three tests, the probability is over 14% when using the common \(\alpha = 0.05\)

Model-Fit Perspective: The more predictors a model has access to, the more flexible it is, the better it will fit the training data, and the more likely it is to become overfit

Okay, So What?

At first, these approaches may seem reasonable, if inefficient

Statistical Standpoint: We’re evaluating lots of \(t\)-tests in determining statistical significance of predictors

The probability of making at least one Type I error (claiming a term is significant when it truly isn’t) becomes inflated – even with just three tests, the probability is over 14% when using the common \(\alpha = 0.05\)

Model-Fit Perspective: The more predictors a model has access to, the more flexible it is, the better it will fit the training data, and the more likely it is to become overfit

Back to the Shopping Analogy

By allowing a model to “shop” freely for its predictors, we are encouraging our model to become overfit

Giving our model a “budget” to spend on its shopping trip would force our model to be more selective about the predictors it chooses, and lowers the likelihood that it becomes overfit

A Look Under the Hood

We’ve hidden the math that fits our models up until this point, but its worth a look now

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

The optimization procedure we’ve been using to find the \(\beta\)-coefficients is called Ordinary Least Squares

Ordinary Least Squares: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - y_{\text{pred}_i}\right)^2}\]

A Look Under the Hood

We’ve hidden the math that fits our models up until this point, but its worth a look now

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

The optimization procedure we’ve been using to find the \(\beta\)-coefficients is called Ordinary Least Squares

Ordinary Least Squares: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik}\right)\right)^2}\]

A Look Under the Hood

We’ve hidden the math that fits our models up until this point, but its worth a look now

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

The optimization procedure we’ve been using to find the \(\beta\)-coefficients is called Ordinary Least Squares

Ordinary Least Squares: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \sum_{j = 1}^{k}{\beta_j x_{ij}}\right)\right)^2}\]

This is the procedure that allows our model to shop freely for predictors

Regularization

Regularization refers to techniques design to constrain models and reduce the likelihood of overfitting

For linear regression, there are two commonly used methods

Ridge Regression
The LASSO (least absolute shrinkage and selection operator)

Each of these methods makes an adjustment to the Ordinary Least Squares procedure we just saw

Regularization: Ridge Regression

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

Ridge Regression: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \sum_{j = 1}^{k}{\beta_j x_{ij}}\right)\right)^2}\]

subject to the constraint

\[\sum_{j = 1}^{k}{\beta_j^2} \leq C\]

Note: \(C\) is a constraint which can be thought of as our budget for coefficients

The Result: Ridge regression encourages very small coefficients on unimportant predictors

Regularization: The LASSO

\[\mathbb{E}\left[y\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k\]

LASSO: Find \(\beta_0, \beta_1, \cdots, \beta_k\) in order to minimize

\[\sum_{i = 1}^{n}{\left(y_{\text{obs}_i} - \left(\beta_0 + \sum_{j = 1}^{k}{\beta_j x_{ij}}\right)\right)^2}\]

subject to the constraint

\[\sum_{j = 1}^{k}{\left|\beta_j\right|} \leq C\]

Note: Like with Ridge Regression, \(C\) is a constraint which can be thought of as our budget for coefficients

The Result: The LASSO pushes coefficients of unimportant predictors to \(0\)

LASSO or Ridge?

The choice between Ridge Regression and the LASSO depends on your goals

The LASSO is better at variable selection because it sends coefficients on unimportant predictors to exactly \(0\)

LASSO won’t always out-perform Ridge though, so you might try both and see which is better for your use case

Feature PreProcessing Requirements

If a predictor \(x_i\) is on a larger scale than the response \(y\), then that predictor becomes artificially cheap to include in a model

Similarly, if a predictor \(x_j\) is on a smaller scale than the response, then that predictor becomes artificially expensive to include in a model

We don’t want any of our predictors to be artificially advantaged or disadvantaged, so we must ensure that all of our numerical predictors are on the same scale as one another

Min-Max Scaling: Projects each numerical predictor down to the interval \(\left[0, 1\right]\) via \(\displaystyle{\frac{x - \min\left(x\right)}{\max\left(x\right) - \min\left(x\right)}}\)

We include step_range() in a recipe to utilize min-max scaling

Standard Scaling: Converts observed measurements into standard deviations (\(z\)-scores) via \(\displaystyle{\frac{x - \text{mean}\left(x\right)}{\text{sd}\left(x\right)}}\)

We include step_normalize() in a recipe to utilize standard scaling

Switching Engines

The {tidymodels} framework is great because it provides us with standardized structure for defining and fitting models

Ridge and the LASSO are still linear regression models, but they’re no longer fit using OLS

This puts them in a class of models called Generalized Linear Models . . .

We’ll need to change our fitting engine from "lm" to something that can fit these GLMs – we’ll use "glmnet"

Required Parameters for "glmnet":

mixture can be set to any value between \(0\) and \(1\)
- Setting mixture = 0 results in Ridge Regression
- Setting mixture = 1 results in the LASSO
penalty is the amount of regularization being applied
- You can think of this parameter as being tied to our coefficient budget

Choosing a `penalty`

For now, we’ll just pick a value and see how it performs

We can experiment with several if we like, and then choose the one that results in the best performance

We’ll talk about a better strategy next time

A Few Additional Concerns

The "glmnet" engine requires that no missing values are included in any fold
- We could omit any rows with missing values
- We could omit any features with missing entries
- We could impute missing values with a step_impute_*() function added to a recipe

Implementing Ridge and LASSO with the Ames Data

We’ll use the ames housing data set that we’ve seen from time to time this semester

We’ll read it in, remove the rows with missing Sale_Price (our response), split the data into training and test sets, and create our cross-validation folds

ames_known_price <- ames %>%
  filter(!is.na(Sale_Price))

set.seed(123)

ames_split <- initial_split(ames_known_price, prop = 0.9)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)

ames_folds <- vfold_cv(ames_train, v = 5)

Ridge Regression

ridge_reg_spec <- linear_reg(mixture = 0, penalty = 1e4) %>%
  set_engine("glmnet")

ridge_reg_rec <- recipe(Sale_Price ~ ., data = ames_train) %>%
  step_impute_knn(all_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_other(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors())

ridge_reg_wf <- workflow() %>%
  add_model(ridge_reg_spec) %>%
  add_recipe(ridge_reg_rec)

ridge_reg_results <- ridge_reg_wf %>%
  fit_resamples(ames_folds)

ridge_reg_results %>%
  collect_metrics()

.metric	.estimator	mean	n	std_err	.config
rmse	standard	33512.4783290	5	3528.8858182	Preprocessor1_Model1
rsq	standard	0.8285515	5	0.0232516	Preprocessor1_Model1

Seeing the Estimated Ridge Regression Model

While we wouldn’t generally fit the Ridge Regression model at this time, you can see how to do that and examine the estimated model below.

ridge_reg_fit <- ridge_reg_wf %>%
  fit(ames_train)

ridge_reg_fit %>%
  tidy()

term	estimate	penalty
(Intercept)	170590.91440	10000
Lot_Frontage	1458.08999	10000
Lot_Area	1919.73420	10000
Year_Built	4443.44715	10000
Year_Remod_Add	5652.52473	10000
Mas_Vnr_Area	6469.12318	10000
BsmtFin_SF_1	-109.24172	10000
BsmtFin_SF_2	619.57697	10000
Bsmt_Unf_SF	-1897.35902	10000
Total_Bsmt_SF	7471.51817	10000
First_Flr_SF	8231.08414	10000
Second_Flr_SF	8576.05018	10000
Gr_Liv_Area	13259.91733	10000
Bsmt_Full_Bath	2910.80533	10000
Bsmt_Half_Bath	-872.28855	10000
Full_Bath	4132.02383	10000
Half_Bath	2520.03561	10000
Bedroom_AbvGr	-2600.41177	10000
Kitchen_AbvGr	-3311.98370	10000
TotRms_AbvGrd	4547.30499	10000
Fireplaces	4625.12646	10000
Garage_Cars	5750.22932	10000
Garage_Area	4248.50579	10000
Wood_Deck_SF	1899.00078	10000
Open_Porch_SF	224.51413	10000
Enclosed_Porch	1142.26429	10000
Three_season_porch	487.29049	10000
Screen_Porch	3569.00946	10000
Pool_Area	-1114.39753	10000
Misc_Val	-5085.90458	10000
Mo_Sold	-203.12359	10000
Year_Sold	-835.49586	10000
Longitude	650.40625	10000
Latitude	5328.30675	10000
MS_SubClass_One_Story_1945_and_Older	-4126.27103	10000
MS_SubClass_One_and_Half_Story_Finished_All_Ages	-1861.67028	10000
MS_SubClass_Two_Story_1946_and_Newer	-1514.91540	10000
MS_SubClass_One_Story_PUD_1946_and_Newer	-4117.65295	10000
MS_SubClass_other	-1733.26832	10000
MS_Zoning_Residential_Medium_Density	-5316.31128	10000
MS_Zoning_other	-5176.60141	10000
Street_other	-13837.22636	10000
Alley_other	-1461.83621	10000
Lot_Shape_Slightly_Irregular	3700.17083	10000
Lot_Shape_other	961.05230	10000
Land_Contour_other	3230.49668	10000
Utilities_other	-8536.41564	10000
Lot_Config_CulDSac	11616.63150	10000
Lot_Config_Inside	357.65609	10000
Lot_Config_other	-5147.76184	10000
Land_Slope_other	3944.23456	10000
Neighborhood_College_Creek	1894.52790	10000
Neighborhood_Old_Town	-1468.07869	10000
Neighborhood_Edwards	-7179.02156	10000
Neighborhood_Somerset	12865.77564	10000
Neighborhood_Northridge_Heights	35341.47415	10000
Neighborhood_Gilbert	-19167.19093	10000
Neighborhood_other	8049.59728	10000
Condition_1_Norm	10577.33452	10000
Condition_1_other	2555.17042	10000
Condition_2_other	7230.83188	10000
Bldg_Type_TwnhsE	-11776.64761	10000
Bldg_Type_other	-15016.92568	10000
House_Style_One_Story	2517.46869	10000
House_Style_Two_Story	-1834.73368	10000
House_Style_other	-3583.23800	10000
Overall_Cond_Above_Average	387.29301	10000
Overall_Cond_Good	5453.63364	10000
Overall_Cond_other	-198.67916	10000
Roof_Style_Hip	9826.95184	10000
Roof_Style_other	-6285.85166	10000
Roof_Matl_other	2956.86795	10000
Exterior_1st_MetalSd	2401.59277	10000
Exterior_1st_Plywood	-423.31025	10000
Exterior_1st_VinylSd	430.87774	10000
Exterior_1st_Wd.Sdng	364.15513	10000
Exterior_1st_other	9216.05135	10000
Exterior_2nd_MetalSd	2810.59106	10000
Exterior_2nd_Plywood	-4782.09685	10000
Exterior_2nd_VinylSd	1979.06774	10000
Exterior_2nd_Wd.Sdng	3586.14793	10000
Exterior_2nd_other	1975.61474	10000
Mas_Vnr_Type_None	7067.96660	10000
Mas_Vnr_Type_Stone	6432.96754	10000
Mas_Vnr_Type_other	-12999.85758	10000
Exter_Cond_Typical	-1223.58374	10000
Exter_Cond_other	-8374.06611	10000
Foundation_CBlock	-3752.43518	10000
Foundation_PConc	4777.91201	10000
Foundation_other	12.67181	10000
Bsmt_Cond_other	-1096.68254	10000
Bsmt_Exposure_Gd	18454.01931	10000
Bsmt_Exposure_Mn	-6292.37350	10000
Bsmt_Exposure_No	-9993.27851	10000
Bsmt_Exposure_other	-3349.21932	10000
BsmtFin_Type_1_BLQ	-979.66229	10000
BsmtFin_Type_1_GLQ	9866.03456	10000
BsmtFin_Type_1_LwQ	-4371.15616	10000
BsmtFin_Type_1_Rec	-2205.86621	10000
BsmtFin_Type_1_Unf	-2137.50633	10000
BsmtFin_Type_1_other	-54.65570	10000
BsmtFin_Type_2_other	-1995.42781	10000
Heating_other	-1333.94062	10000
Heating_QC_Good	-4130.03350	10000
Heating_QC_Typical	-7280.75514	10000
Heating_QC_other	-10421.28767	10000
Central_Air_Y	305.63110	10000
Electrical_SBrkr	-843.81162	10000
Electrical_other	145.95150	10000
Functional_other	-16837.88549	10000
Garage_Type_BuiltIn	2912.19245	10000
Garage_Type_Detchd	-1782.91691	10000
Garage_Type_No_Garage	1405.71932	10000
Garage_Type_other	-13424.85604	10000
Garage_Finish_No_Garage	2074.34345	10000
Garage_Finish_RFn	-7304.15017	10000
Garage_Finish_Unf	-3707.86157	10000
Garage_Cond_Typical	587.45190	10000
Garage_Cond_other	-4325.25632	10000
Paved_Drive_Paved	1759.89335	10000
Paved_Drive_other	3091.17128	10000
Pool_QC_other	12835.93973	10000
Fence_No_Fence	-1541.73821	10000
Fence_other	-1034.11743	10000
Misc_Feature_other	6558.20291	10000
Sale_Type_WD.	-6374.52035	10000
Sale_Type_other	-7784.24102	10000
Sale_Condition_Normal	6003.00979	10000
Sale_Condition_Partial	13836.66751	10000
Sale_Condition_other	3630.29900	10000

The LASSO

lasso_reg_spec <- linear_reg(mixture = 1, penalty = 1e4) %>%
  set_engine("glmnet")

lasso_reg_rec <- recipe(Sale_Price ~ ., data = ames_train) %>%
  step_impute_knn(all_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_other(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors())

lasso_reg_wf <- workflow() %>%
  add_model(lasso_reg_spec) %>%
  add_recipe(lasso_reg_rec)

lasso_reg_results <- lasso_reg_wf %>%
  fit_resamples(ames_folds)

lasso_reg_results %>%
  collect_metrics()

.metric	.estimator	mean	n	std_err	.config
rmse	standard	40537.9245852	5	3303.1924456	Preprocessor1_Model1
rsq	standard	0.7821723	5	0.0229465	Preprocessor1_Model1

Seeing the Estimated LASSO Model

Again, we wouldn’t generally fit the LASSO model at this time, however you can see how to do that and examine the estimated model below.

lasso_reg_fit <- lasso_reg_wf %>%
  fit(ames_train)

lasso_reg_fit %>%
  tidy()

term	estimate	penalty
(Intercept)	174537.449	10000
Lot_Frontage	0.000	10000
Lot_Area	0.000	10000
Year_Built	7478.097	10000
Year_Remod_Add	6606.480	10000
Mas_Vnr_Area	3398.018	10000
BsmtFin_SF_1	0.000	10000
BsmtFin_SF_2	0.000	10000
Bsmt_Unf_SF	0.000	10000
Total_Bsmt_SF	12271.407	10000
First_Flr_SF	0.000	10000
Second_Flr_SF	0.000	10000
Gr_Liv_Area	26200.798	10000
Bsmt_Full_Bath	0.000	10000
Bsmt_Half_Bath	0.000	10000
Full_Bath	0.000	10000
Half_Bath	0.000	10000
Bedroom_AbvGr	0.000	10000
Kitchen_AbvGr	0.000	10000
TotRms_AbvGrd	0.000	10000
Fireplaces	3226.727	10000
Garage_Cars	6992.504	10000
Garage_Area	4368.112	10000
Wood_Deck_SF	0.000	10000
Open_Porch_SF	0.000	10000
Enclosed_Porch	0.000	10000
Three_season_porch	0.000	10000
Screen_Porch	0.000	10000
Pool_Area	0.000	10000
Misc_Val	0.000	10000
Mo_Sold	0.000	10000
Year_Sold	0.000	10000
Longitude	0.000	10000
Latitude	0.000	10000
MS_SubClass_One_Story_1945_and_Older	0.000	10000
MS_SubClass_One_and_Half_Story_Finished_All_Ages	0.000	10000
MS_SubClass_Two_Story_1946_and_Newer	0.000	10000
MS_SubClass_One_Story_PUD_1946_and_Newer	0.000	10000
MS_SubClass_other	0.000	10000
MS_Zoning_Residential_Medium_Density	0.000	10000
MS_Zoning_other	0.000	10000
Street_other	0.000	10000
Alley_other	0.000	10000
Lot_Shape_Slightly_Irregular	0.000	10000
Lot_Shape_other	0.000	10000
Land_Contour_other	0.000	10000
Utilities_other	0.000	10000
Lot_Config_CulDSac	0.000	10000
Lot_Config_Inside	0.000	10000
Lot_Config_other	0.000	10000
Land_Slope_other	0.000	10000
Neighborhood_College_Creek	0.000	10000
Neighborhood_Old_Town	0.000	10000
Neighborhood_Edwards	0.000	10000
Neighborhood_Somerset	0.000	10000
Neighborhood_Northridge_Heights	22598.017	10000
Neighborhood_Gilbert	0.000	10000
Neighborhood_other	0.000	10000
Condition_1_Norm	0.000	10000
Condition_1_other	0.000	10000
Condition_2_other	0.000	10000
Bldg_Type_TwnhsE	0.000	10000
Bldg_Type_other	0.000	10000
House_Style_One_Story	0.000	10000
House_Style_Two_Story	0.000	10000
House_Style_other	0.000	10000
Overall_Cond_Above_Average	0.000	10000
Overall_Cond_Good	0.000	10000
Overall_Cond_other	0.000	10000
Roof_Style_Hip	0.000	10000
Roof_Style_other	0.000	10000
Roof_Matl_other	0.000	10000
Exterior_1st_MetalSd	0.000	10000
Exterior_1st_Plywood	0.000	10000
Exterior_1st_VinylSd	0.000	10000
Exterior_1st_Wd.Sdng	0.000	10000
Exterior_1st_other	0.000	10000
Exterior_2nd_MetalSd	0.000	10000
Exterior_2nd_Plywood	0.000	10000
Exterior_2nd_VinylSd	0.000	10000
Exterior_2nd_Wd.Sdng	0.000	10000
Exterior_2nd_other	0.000	10000
Mas_Vnr_Type_None	0.000	10000
Mas_Vnr_Type_Stone	0.000	10000
Mas_Vnr_Type_other	0.000	10000
Exter_Cond_Typical	0.000	10000
Exter_Cond_other	0.000	10000
Foundation_CBlock	0.000	10000
Foundation_PConc	3534.049	10000
Foundation_other	0.000	10000
Bsmt_Cond_other	0.000	10000
Bsmt_Exposure_Gd	10297.300	10000
Bsmt_Exposure_Mn	0.000	10000
Bsmt_Exposure_No	0.000	10000
Bsmt_Exposure_other	0.000	10000
BsmtFin_Type_1_BLQ	0.000	10000
BsmtFin_Type_1_GLQ	7603.684	10000
BsmtFin_Type_1_LwQ	0.000	10000
BsmtFin_Type_1_Rec	0.000	10000
BsmtFin_Type_1_Unf	0.000	10000
BsmtFin_Type_1_other	0.000	10000
BsmtFin_Type_2_other	0.000	10000
Heating_other	0.000	10000
Heating_QC_Good	0.000	10000
Heating_QC_Typical	0.000	10000
Heating_QC_other	0.000	10000
Central_Air_Y	0.000	10000
Electrical_SBrkr	0.000	10000
Electrical_other	0.000	10000
Functional_other	0.000	10000
Garage_Type_BuiltIn	0.000	10000
Garage_Type_Detchd	0.000	10000
Garage_Type_No_Garage	0.000	10000
Garage_Type_other	0.000	10000
Garage_Finish_No_Garage	0.000	10000
Garage_Finish_RFn	0.000	10000
Garage_Finish_Unf	0.000	10000
Garage_Cond_Typical	0.000	10000
Garage_Cond_other	0.000	10000
Paved_Drive_Paved	0.000	10000
Paved_Drive_other	0.000	10000
Pool_QC_other	0.000	10000
Fence_No_Fence	0.000	10000
Fence_other	0.000	10000
Misc_Feature_other	0.000	10000
Sale_Type_WD.	0.000	10000
Sale_Type_other	0.000	10000
Sale_Condition_Normal	0.000	10000
Sale_Condition_Partial	0.000	10000
Sale_Condition_other	0.000	10000

Non-zero LASSO Coefficients

Here are only the predictors with non-zero coefficients

lasso_reg_fit %>%
  tidy() %>%
  filter(estimate != 0)

term	estimate	penalty
(Intercept)	174537.449	10000
Year_Built	7478.097	10000
Year_Remod_Add	6606.480	10000
Mas_Vnr_Area	3398.018	10000
Total_Bsmt_SF	12271.407	10000
Gr_Liv_Area	26200.798	10000
Fireplaces	3226.727	10000
Garage_Cars	6992.504	10000
Garage_Area	4368.112	10000
Neighborhood_Northridge_Heights	22598.017	10000
Foundation_PConc	3534.049	10000
Bsmt_Exposure_Gd	10297.300	10000
BsmtFin_Type_1_GLQ	7603.684	10000

Summary

The more predictors we include in a model, the more flexible that model is
We can use regularization methods to constrain our models and make overfitting less likely
Two techniques commonly used with linear regression models are Ridge Regression and the LASSO
These methods alter the optimization problem that obtains the estimated \(\beta\)-coefficients for our model
Ridge Regression attaches very small coefficients to uninformative predictors, while the LASSO attaches coefficients of \(0\) to them
- This means that the LASSO can be used for variable selection
Both Ridge Regression and the LASSO require all numerical predictors to be scaled
We can fit/cross-validate these models in nearly the same way that we have been working with ordinary linear regression models
- We set_engine("glmnet") rather than set_engine("lm") for ridge and LASSO
- We set mixture = 0 for Ridge Regression and mixture = 1 for the LASSO
- We define a penalty parameter which determines the amount of regularization (constraint) applied to the model

Next Time…

Other Classes of Regression Model