Regularization of ML models

Training ML models is a difficult task sometimes if your ML model works well with the training set but not with the testing set. At that time, we used to think about what shall we do so that it can perform better.

So here in this blog, we would know how to deal with such a situation.

Hello guys, there are situations when our model works well with the training set but not with the test dataset. Such a situation is called overfitting. The model is overfitting from the data given to it. In such a case, the model has a high degree of freedom due to which it is looking for the solution randomly at higher space and not at a particular location where the solution can be found.

In such a case, the solution is the regularization of the model to prevent it from overfitting the data. In simple terms, regularization means to constrain the model. This is a good way to reduce the overfitting of the model, the few degrees of freedom it has, the harder it will be for the model to overfit the data.

For a linear model, regularization can be performed by constraining the weights of the model. There are certain methods to regularize the model. These methods are Ridge regression, Lasso regression and Elastic Net which implement three different ways to constrain it.

In this blog, we will look at the implementation of Ridge regularization and describe it.

Ridge Regularization

This is also called Tikhonov regularization, named after Andrey Tikhonov.

It is particularly useful for training a model having multiple features strongly correlated to each other. It is also used to regularize the regression to calculate the coefficients of multiple regression models.

This is also known as Ridge regression used for solving problems related to multicollinearity of features in linear regression.

https://en.wikipedia.org/wiki/Ridge_regression

For the ridge regression, a term called regularization term is added to the cost function of the regression method which will constrain the model and will not allow the model to overfit as it limits the degree of freedom.

Regularization term:

$$α \frac{1}{2}\sum_{i=1}^{n} θ_i^2$$

Here this term is added to the cost function of regression only for training and evaluation of the model by predicting for test dataset should be performed using unregularized performance measures.

The hyperparameter α controls the amount of regularization.

$$Ridge\ Regression\ Cost\ function:\\\\\ J(θ) = MSE(θ) + α \frac{1}{2}\sum_{i=1}^{n} θ_i^2$$

Can you observe this cost function clearly, here the weights or parameters are updated from θ=1 to n but the bias term Θ₀ is not regularized. In vectorized form, if Θ is expressed as a vector of feature weights w (from Θ₁ to Θ_n ) then it gives a new form to the formula of ridge regression.

Note: Here the form of the formula will be changed but the formula will remain the same.

Regularization term:

$$α \frac{1}{2}\ (|| w ||_{2})^2$$

Here we can see that ||w||₂ represents the l₂ norm of the weight vector. Thus, the cost function in vectorized form:

$$Vectorized\ Ridge\ Reg.\ Cost\ function: J(θ) = MSE(θ) + α \frac{1}{2}\ (|| w ||_{2})^2$$

What if we are doing gradient descent, in that case, the weight vector is simply represented with w and we will add αw to the MSE gradient vector.

Implementation of Ridge Regression

Ridge regression can be performed with linear regression using two different techniques. As I have already discussed in one of my blogs (Blog Link), we can perform regression either by using a closed-form solution or by performing gradient descent. The pros and cons will remain the same.

$$Ridge\ Regression\ Closed\ form\ solution : \newline \hat{\theta} = \left( X^{T} X + \alpha A \right)^{-1} X^{T} y$$

Here A is an identity matrix except with 0 on the top left cell, corresponding to a bias term having size (n+1)x(n+1).

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn-linear-model-ridge

Sklearn provides a method to perform Ridge regression using closed form solution, actually, it is a variant of this method using a matrix factorization technique developed by André-Louis Cholesky.

from sklearn.linear_model import Ridge

ridge_reg = Ridge(alpha=1, solver="cholesky")
ridge_reg.fit(X, y)
ridge_reg.predict(X_new)

There is another method provided by Sklearn to perform Ridge regression using the gradient descent technique:

from sklearn.linear_model import SGDRegressor

sgd_reg = SGDRegressor(penalty="l2")
sgd_reg.fit(X, y.ravel())
sgd_reg.predict(X_new)

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html

So in this way we can perform Ridge Regression which helps in regularization of the regression models so as to overcome the problem of overfitting of the model from the data.

Thank you

Akhil Soni

You can connect with me - Linkedin

Regularization of ML models

Regularizing linear ML models - Ridge Regression

Ridge Regularization

Implementation of Ridge Regression