Hello guys, Today we will know about feature scaling in this blog. First of all, while dealing with ML algorithms, you must have arrived at such a situation where any of the algorithms is not performing well on the data. The reason can be the data itself. Before running any algorithm over the data, it is necessary to preprocess the data. During the preprocessing, cleaning of the data is the most important step. Another step is feature scaling. The data used for feature scaling must be numerical. Let us know with a practical aspect.

Problem: Machine learning algorithms don't perform well when the input numerical attributes have different scales.

Given: Data with numerical attributes having different scales.

Solution: Feature Scaling

### What is feature scaling?

When there are different scales in input numerical attributes in the data, the method to get all attributes to have the same scale is called feature scaling.

**Example**

Suppose we have data having the number of students in each class. The total number of students ranges from about 6 to 41,860, while the median number of teachers only ranges from 0 to 15.

In such a case, feature scaling is required.

### Ways of Feature Scaling

**Min-max Scaling**

Min-max scaling is nothing but normalization of the values so that the values are shifted and rescaled so that they end up ranging from 0 to 1. It can be performed by subtracting the min value and dividing by the max minus the min. Scikit-learn provides a transformer for this called MinMaxScaler. It has a feature_range hyperparameter that lets us change the range if we don't want range from 0-1.

Transform features by scaling each feature to a given range.

This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

The transformation is given by:

```
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
```

where min, max = feature_range.

This transformation is often used as an alternative to zero mean, unit variance scaling.

```
from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit(data))
MinMaxScaler()
print(scaler.data_max_)
```

Methods

| Compute the minimum and maximum to be used for later scaling. |

| Fit to data, then transform it. |

| Get output feature names for transformation. |

| Get parameters for this estimator. |

Undo the scaling of X according to feature_range. | |

| Online computation of min and max on X for later scaling. |

| Set output container. |

| Set the parameters of this estimator. |

| Scale features of X according to feature_range. |

**Standardization**

This is a quite different method. It first subtracts the mean value ( so standardized values always have a zero mean), and then it divides by standard deviation so that the resulting distribution has a unit variance. But, the standardization method does not bound.

Standardize features by removing the mean and scaling to unit variance.

The standard score of a sample `x`

is calculated as:

z = (x - u) / s

where `u`

is the mean of the training samples or zero if `with_mean=False`

, and `s`

is the standard deviation of the training samples or one if `with_std=False`

.

```
from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit(data))
print(scaler.mean_)
```

Methods

| Compute the mean and std to be used for later scaling. |

| Fit to data, then transform it. |

| Get output feature names for transformation. |

| Get parameters for this estimator. |

| Scale back the data to the original representation. |

| Online computation of mean and std on X for later scaling. |

| Set output container. |

| Set the parameters of this estimator. |

| Perform standardization by centering and scaling. |

**Compare of standardization with feature scaling**

Problem: A system needs data to be scaled in a custom range of values.

Solution: Feature Scaling

In this problem, standardization does not work as it does not bound values to a specific range, which may be a problem for some algorithms (e.g., neural networks often expect an input value ranging from 0 to 1).

Standardization is much less affected by outliers as compared to Min-max scaling.

For example:

Suppose a class had a median number of teachers equal to 100 (by mistake). Min-max scaling would then crush all the other values from 0-15 down to 0-0.15, whereas standardization would not be much affected.

Scikit-learn has provided a transformer for standardization called StandardScaler.

I hope you like it!!

Thank you

-Akhil Soni

You can connect with me on linkedin