Training Linear Models

Training Linear Models

Working with closed form solution

Hello guys, you must have encountered training a machine learning model while developing a model to tackle your problem statement or to achieve your objective. There are many methods to train a linear model. One such method is closed form solution having a normal equation.

Linear Regression: y = ϴ∙X

The Normal Equation

This is what we need to do to train and modify the linear equation of the model to train it.

$$\hat{θ} = \left( X^{T}\cdot X \right)^{-1}\cdot X^{T}\cdot y$$

Now let us go deep inside and understand what this mean to train a linear model of linear regression with the method of closed form solution. - Linear Model

Training a model means setting its parameter so that the model best fits the training set and gives the best result with the least error. To evaluate how well a model is performing, we have a performance measure known as Root Mean Square Error (RMSE) which needs to be minimized. It is easier to minimize Mean Square Error (MSE) so the value that minimizes MSE would minimize RMSE too. Thus, we will now choose parameters for the linear regression model such that it would minimize the MSE of the model.

Suppose we have a linear regression model having the following equation:

$$\hat{y} = {\theta}\cdot X$$

where ŷ is the predicted value

θ is the model's parameter vector

X is the instance's feature vector.

Here we need to set the parameter vector so that it can minimize the value of MSE of the linear regression model, in such a way we need to train the linear model.

The MSE of a linear regression model on a training set X is calculated using

$$MSE(θ) = \frac{1}{m} \sum_{i==1}^{m} (\theta^{T}\cdot X^{(i)} - y^{(i)})^{2}$$

There are various methods to train a linear regression model but here I will tell about a method called as closed-form solution.

The Normal Equation

You would be thinking why I have written here The Normal Equation as we are o going to learn about closed-form solution. To find the value of theta that minimizes the value of the cost function, there is a closed-form solution - in other words a mathematical equation that gives the result directly. This is called the normal equation.

The normal Equation is given by

$$\hat{θ} = \left( X^{T}\cdot X \right)^{-1}\cdot X^{T}\cdot y$$

where θ^ is the value of θ that minimizes the cost function

y is the vector of target values

Let us see the code for the training of the linear model of linear regression having an equation for the target values.

y = 4 + 3x

import numpy as np
import matplotlib.pyplot as plt

X = 2*np.random.rand(100, 1)
y = 4 + 3*X + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), X]
theta_best = np.linalg.inv(

Here we have created X and y, where X is the instance's feature vector while y is the target vector.

Now we add 1 in the matrix in the front row so as to ease it with the calculation and shape correctly to deal with it. After adding, the dot() function is used to perform the dot product of two vectors. inv() function from Numpy's linear algebra module (np.linalg) is used to inverse the matrix. The actual function we used to generate the data is y = 4 + 3x + Gaussian noise. - Linear Regression

Performing Linear Regression using scikit-learn is quite simple.

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression(), y)
print(lin_reg.intercept_, lin_reg.coef_)

The LinearRegression class is based on the scipy.linalg.lstsq() (the name stands for "least square") which could be called directly:

theta_best_svd, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=1e-6)

This function computes θ̂ = X+ y,

where X+ is the pseudoinverse of X (specifically the Moore-Penrose inverse).

The following code can be used to calculate the inverse:


The pseudoinverse itself is computed here using a standard matrix factorization technique called Singular Value Decomposition (SVD) that can decompose the training set matrix into matrix multiplication of three matrices U ∑ VT .

The pseudoinverse is computed as X+ = V ∑+ VT .

To calculate the matrix ∑+ , the algorithm takes ∑ and sets to zero to all values which are smaller than a threshold value, then it replaces all the non-zero values with their inverse, and finally transposes the resultant matrix.

This is more efficient than computing the normal equation and also it handles the edge cases. Also normal equation would not work when XT X is not invertible.

This is how we can train a linear model.

Thank you, hope you like this.

Akhil Soni

You can connect with me - Linkedin