What is Ridge Regression? [Updated]

January 21, 2024

254

Ridge regression is a model-tuning method that is used to analyze any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.

The cost function for ridge regression:

Min(||Y – X(theta)||^2 + λ||theta||^2)

Lambda is the penalty term. λ given here is denoted by an alpha parameter in the ridge function. So, by changing the values of alpha, we are controlling the penalty term. The higher the values of alpha, the bigger is the penalty and therefore the magnitude of coefficients is reduced.

It shrinks the parameters. Therefore, it is used to prevent multicollinearity
It reduces the model complexity by coefficient shrinkage
Check out the free course on regression analysis.

Ridge Regression Models

For any type of regression machine learning model, the usual regression equation forms the base which is written as:

Y = XB + e

Where Y is the dependent variable, X represents the independent variables, B is the regression coefficients to be estimated, and e represents the errors are residuals.

Once we add the lambda function to this equation, the variance that is not evaluated by the general model is considered. After the data is ready and identified to be part of L2 regularization, there are steps that one can undertake.

Standardization

In ridge regression, the first step is to standardize the variables (both dependent and independent) by subtracting their means and dividing by their standard deviations. This causes a challenge in notation since we must somehow indicate whether the variables in a particular formula are standardized or not. As far as standardization is concerned, all ridge regression calculations are based on standardized variables. When the final regression coefficients are displayed, they are adjusted back into their original scale. However, the ridge trace is on a standardized scale.

Also Read: Support Vector Regression in Machine Learning

Bias and variance trade-off

Bias and variance trade-off is generally complicated when it comes to building ridge regression models on an actual dataset. However, following the general trend which one needs to remember is:

The bias increases as λ increases.
The variance decreases as λ increases.

Assumptions of Ridge Regressions

The assumptions of ridge regression are the same as those of linear regression: linearity, constant variance, and independence. However, as ridge regression does not provide confidence limits, the distribution of errors to be normal need not be assumed.

Now, let’s take an example of a linear regression problem and see how ridge regression if implemented, helps us to reduce the error.

We shall consider a data set on Food restaurants trying to find the best combination of food items to improve their sales in a particular region.

Upload Required Libraries

import numpy as np import pandas as pd
import os import seaborn as sns
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt import matplotlib.style
plt.style.use('classic') import warnings
warnings.filterwarnings("ignore") df = pd.read_excel("food.xlsx")

After conducting all the EDA on the data, and treatment of missing values, we shall now go ahead with creating dummy variables, as we cannot have categorical variables in the dataset.

df =pd.get_dummies(df, columns=cat,drop_first=True)

Where columns=cat is all the categorical variables in the data set.

After this, we need to standardize the data set for the Linear Regression method.

Scaling the variables as continuous variables has different weightage

#Scales the data. Essentially returns the z-scores of every attribute from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale df['week'] = std_scale.fit_transform(df[['week']])
df['final_price'] = std_scale.fit_transform(df[['final_price']])
df['area_range'] = std_scale.fit_transform(df[['area_range']])

Train-Test Split

# Copy all the predictor variables into X dataframe
X = df.drop('orders', axis=1) # Copy target into the y dataframe. Target variable is converted in to Log. y = np.log(df[['orders']]) # Split X and y into training and test set in 75:25 ratio
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25 , random_state=1)

Linear Regression Model

Ridge Regression versus Lasso Regression: Understanding the Key Differences

In the world of linear regression models, Ridge and Lasso Regression stand out as two fundamental techniques, both designed to enhance the prediction accuracy and interpretability of the models, particularly in situations with complex and high-dimensional data. The core difference between the two lies in their approach to regularization, which is a method to prevent overfitting by adding a penalty to the loss function. Ridge Regression, also known as Tikhonov regularization, adds a penalty term that is proportional to the square of the magnitude of the coefficients. This method shrinks the coefficients towards zero but never exactly to zero, thereby reducing model complexity and multicollinearity. In contrast, Lasso Regression (Least Absolute Shrinkage and Selection Operator) includes a penalty term that is the absolute value of the magnitude of the coefficients. This distinctive approach not only shrinks coefficients but can also reduce some of them to zero, effectively performing feature selection and resulting in simpler, more interpretable models.

The decision to use Ridge or Lasso Regression hinges on the specific requirements of the dataset and the underlying problem to be solved. Ridge Regression is preferred when all the features are assumed to be relevant or when we have a dataset with multicollinearity, as it can handle correlated inputs more effectively by distributing coefficients among them. Lasso Regression, meanwhile, excels in situations where parsimony is advantageous—when it’s beneficial to reduce the number of features contributing to the model. This is particularly useful in high-dimensional datasets where feature selection becomes essential. However, Lasso can be inconsistent in cases of highly correlated features. Therefore, the choice between Ridge and Lasso should be informed by the nature of the data, the desired model complexity, and the specific goals of the analysis, often determined through cross-validation and comparative model performance assessment.

Ridge Regression in Machine Learning

Ridge regression is a key technique in machine learning, indispensable for creating robust models in scenarios prone to overfitting and multicollinearity. This method modifies standard linear regression by introducing a penalty term proportional to the square of the coefficients, which proves particularly useful when dealing with highly correlated independent variables. Among its primary benefits, ridge regression effectively reduces overfitting through added complexity penalties, manages multicollinearity by balancing effects among correlated variables, and enhances model generalization to improve performance on unseen data.

The implementation of ridge regression in practical settings involves the crucial step of selecting the right regularization parameter, commonly known as lambda. This selection, typically done using cross-validation techniques, is vital for balancing the bias-variance tradeoff inherent in model training. Ridge regression enjoys widespread support across various machine learning libraries, with Python’s scikit-learn being a notable example. Here, implementation entails defining the model, setting the lambda value, and employing built-in functions for fitting and predictions. Its utility is particularly notable in sectors like finance and healthcare analytics, where precise predictions and robust model construction are paramount. Ultimately, ridge regression’s capacity to improve accuracy and handle complex data sets solidifies its ongoing importance in the dynamic field of machine learning.

Also Read: What is Quantile Regression?

The higher the value of the beta coefficient, the higher is the impact.

Dishes like Rice Bowl, Pizza, Desert with a facility like home delivery and website_homepage_mention plays an important role in demand or number of orders being placed in high frequency.

Variables showing negative effect on regression model for predicting restaurant orders: cuisine_Indian,food_category_Soup , food_category_Pasta , food_category_Other_Snacks.

Final_price has a negative effect on the order – as expected.

Dishes like Soup, Pasta, other_snacks, Indian food categories hurt model prediction on the number of orders being placed at restaurants, keeping all other predictors constant.

Some variables which are hardly affecting model prediction for order frequency are week and night_service.

Through the model, we are able to see object types of variables or categorical variables are more significant than continuous variables.

Also Read: Introduction to Regular Expression in Python

[embedded content]

Regularization

Value of alpha, which is a hyperparameter of Ridge, which means that they are not automatically learned by the model instead they have to be set manually. We run a grid search for optimum alpha values
To find optimum alpha for Ridge Regularization we are applying GridSearchCV

from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV ridge=Ridge()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}
ridge_regressor=GridSearchCV(ridge,parameters,scoring='neg_mean_squared_error',cv=5)
ridge_regressor.fit(X,y) print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_) {'alpha': 0.01}
-0.3751867421112124

The negative sign is because of the known error in the Grid Search Cross Validation library, so ignore the negative sign.

predictors = X_train.columns coef = Series(ridgeReg.coef_.flatten(),predictors).sort_values()
plt.figure(figsize=(10,8))
coef.plot(kind='bar', title='Model Coefficients')
plt.show()

From the above analysis we can decide that the final model can be defined as:

Orders = 4.65 + 1.02home_delivery_1.0 + .46 website_homepage_mention_1 0+ (-.40* final_price) +.17area_range + 0.57food_category_Desert + (-0.22food_category_Extras) + (-0.73food_category_Pasta) + 0.49food_category_Pizza + 1.6food_category_Rice_Bowl + 0.22food_category_Salad + 0.37food_category_Sandwich + (-1.05food_category_Soup) + (-0.37food_category_Starters) + (-1.13cuisine_Indian) + (-0.16center_type_Gurgaon)

Top 5 variables influencing regression model are:

food_category_Rice Bowl
home_delivery_1.0
food_category_Pizza
food_category_Desert
website_homepage_mention_1

The higher the beta coefficient, the more significant is the predictor. Hence, with certain level model tuning, we can find out the best variables that influence a business problem.

If you found this blog helpful and want to learn more about such concepts, you can join Great Learning Academy’s free online courses today.

What is Ridge Regression?

Ridge regression is a linear regression method that adds a bias to reduce overfitting and improve prediction accuracy.

How Does Ridge Regression Differ from Ordinary Least Squares?

Unlike ordinary least squares, ridge regression includes a penalty on the magnitude of coefficients to reduce model complexity.

When Should You Use Ridge Regression?

Use ridge regression when dealing with multicollinearity or when there are more predictors than observations.

What is the Role of the Regularization Parameter in Ridge Regression?

The regularization parameter controls the extent of coefficient shrinkage, influencing model simplicity.

Can Ridge Regression Handle Non-Linear Relationships?

While primarily for linear relationships, ridge regression can include polynomial terms for non-linearities.

How is Ridge Regression Implemented in Software?

Most statistical software offers built-in functions for ridge regression, requiring variable specification and parameter value.

How to Choose the Best Regularization Parameter?

The best parameter is often found through cross-validation, using techniques like grid or random search.

What are the Limitations of Ridge Regression?

It includes all predictors, which can complicate interpretation, and choosing the optimal parameter can be challenging.

Source: GreatLearning Blog

What is Ridge Regression? [Updated]

Ridge Regression Models

Standardization

Bias and variance trade-off

Assumptions of Ridge Regressions

Upload Required Libraries

Scaling the variables as continuous variables has different weightage

Train-Test Split

Linear Regression Model

Ridge Regression versus Lasso Regression: Understanding the Key Differences

Ridge Regression in Machine Learning

Regularization

Related

Guide to Predictive Analytics: Definition, Core Concepts, Tools, and Use Cases

Guide to Statistical Analysis: Definition, Types, and Careers

Business Analyst Salaries In India: The 2024 Outlook

Most Popular

CfP: International Conference on Rivers by Vinayaka Mission’s Law School, Chennai [October 18 – 19]: Submit Abstracts by August 15!

Internship Opportunity at Counselect, Mumbai [Paid; 3 Months; Hybrid]: Apply Now!

Law School Experience: IMS Unison University, Dehradun, Uttarakhand: Rigorous Academics; Dynamic Learning Environment

Public prosecutors are political appointees: Former Madras High Court judge Justice PN Prakash

Recent Comments

EDITOR PICKS

CfP: International Conference on Rivers by Vinayaka Mission’s Law School, Chennai [October 18 – 19]: Submit Abstracts by August 15!

Internship Opportunity at Counselect, Mumbai [Paid; 3 Months; Hybrid]: Apply Now!

Law School Experience: IMS Unison University, Dehradun, Uttarakhand: Rigorous Academics; Dynamic Learning Environment

POPULAR POSTS

CfP: International Conference on Rivers by Vinayaka Mission’s Law School, Chennai [October 18 – 19]: Submit Abstracts by August 15!

Internship Opportunity at Counselect, Mumbai [Paid; 3 Months; Hybrid]: Apply Now!

Law School Experience: IMS Unison University, Dehradun, Uttarakhand: Rigorous Academics; Dynamic Learning Environment

POPULAR CATEGORY

ABOUT US

FOLLOW US