Pytorch Training Setup

Hey guys!

Have you ever get fascinated that how does a model get trained in Pytorch?

In this article, we will come to know what is that main concept with which every model works and learns and finally produces such outputs for the inputs that can amaze us. Here we will just build a linear model and see how to train that model. Let us start with the process of setup.

Dataset

To train a model, we need some data looking at which it learns to provide some outputs and also learns about those inputs for which it needs to perform some operations and then produce the result. Since this is a simple model setup, let us take two random tensor arrays as dataset out of which one will be input and other will be output.

import torch
input = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4])
output = torch.tensor([0.5, 14, 15, 28, 11, 8, 3, -4, 6, 13, 21])

In real world of working with pytorch, we can work on different datasets. Since, we are preparing a linear model here so we have one input tensor which will be provided to the model and we have another output tensor which is the actual output that is needed to be produced by the model after training for any of the given input.

Sample Model

Let us now define a linear model. Since this is a simple linear model so it has just an equation of straight line. So basically our model is nothing but a graph of linear equation.

Let y = wx+b be our model

where w is the weight and b is the bias.

Now, training a model means to find such values of w and b that fits into the equation and produce the output exactly same or very close to the given output values. The output generated by model is known as predicted output. So the predicted values must be closer to the actual output needed.

Here the model will be defined as

def model(input_t, w, b):
  return w*input_t + b

Here w is the weight for the model, b is the bias and input_t is the input tensor.

Setting up a loss function for model

A thought usually arrives in the mind for the need of a loss function. The loss function is needed to let the model know how far it is from producing the actual output. The predicted output generated by model is how much far from actual output or whether it is correct or not, this will be known by using a loss function. We have different loss functions, here we will use mean square loss function to calculate loss between actual and prediction outputs.

def loss_fn(output, predicted):
  sq_diff = (output-predicted)**2
  return sq_diff.mean()

Estimating derivative of loss function

It might be surprising the calculation of derivative of loss function. The need comes from the mathematical property of derivatives. We calculate derivative of a function so that we can come to know about its optimum points where function can minimize or maximize. With the help of derivative, we will update the value for weight and bias. Thus, we need to calculate derivative of loss function with respect to weight and bias. Here weights and bias are known as the parameters of the model. Let us now define the derivative of loss function with respect to weight and then derivative of loss function with respect to bias. After getting gradient, we just update the parameters by subtracting the product of gradient and a learning rate. The learning rate is a hyperparameter that controls how quickly a neural network adapts to the problem during training.

def grad_fn(input, output, w, b):
  delta = 0.1

  loss_rate_of_change_w = (loss_fn(model(input, w+delta, b), output) - loss_fn(model(input, w-delta, b), output))/(2.0*delta)
  loss_rate_of_change_b = (loss_fn(model(input, w, b+delta), output) - loss_fn(model(input, w, b-delta), output))/(2.0*delta)
  return torch.stack([loss_rate_of_change_w, loss_rate_of_change_b])

Here I have tried to calculate the derivatives by coding from scratch and not using any in-built function but there are in-built pytorch functions that calculate the derivative which we will discuss later.

Update model parameters

Since we have calculated rate of change of loss with respect to weight and bias which are the parameters of our model, we will make the changes in our weight and bias. This is what we need at the end, such values of weight and bias that fits into our dataset or produce very minimum loss.

lr = 1e-2
loss_rate_of_change_w, loss_rate_of_change_b = grad_fn(input, output, w, b)
w = w-lr*loss_rate_of_change_w
b = b-lr*loss_rate_of_change_b

With this code, we changed our weight and bias a little and again we will loop around checking whether it is fitting for having a minimum loss and giving us the correct output which matches with our actual output.

params = torch.tensor([1.0, 0.0])
nepochs = 100
lr = 1e-2
for epoch in range(nepochs):
  w, b = params
  predicted = model(input, w, b)
  loss = loss_fn(predicted, output)
  print('Epoch: ', epoch, ', Loss: ', loss)
  grad = grad_fn(input, output, w, b)
  print('Params: ', params)
  print('Grad: ', grad)
  params = params - lr*grad

This is how the pytorch training setup needs to be done. Now our model is ready for training and go into the loop and at the end, the parameters we get is the learning of our model and these parameters will help to reach to the answer for every input. But it can be possible that for some inputs, it may not give us the correct output for which there can be various reasons.

That’s all in this article.

- Akhil Soni