GridSearch: For hyperparameter Tuning

GridSearch: For hyperparameter Tuning

Helping you out in optimizing and selection of model

·

4 min read

While preparing any machine learning model, you must be thinking that there are many parameters for that model. The thought of selecting a value for those parameters which can impact the most to the performance of the model. Now, we can have a range of parameters and a range of values for them. It would be time-consuming and require high computations if we keep trying them one by one. There will be a large number of combinations of different ranges of parameters with each other which would not be possible to test one by one. Here comes the concept of GridSearch.

A hyperparameter is a parameter of a learning algorithm (not of the model). It must be set prior to the training and remains constant during training. To generalize a model better, we should train it over different hyperparameters and then decide in which value it is performing better.

GridSearch performs exhaustive search over specified parameter values for an estimator. The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid. This is a technique which is used to find the optimal hyperparameter values to best generalize a model and optimize its performance by testing systematically over parameter values.

GridSearch evaluates performance of a model with different scoring metrics such as accuracy, precision, recall and f1 using cross-validation technique.

To perform grid search technique, sklearn provides GridSearchCV which takes following important input parameters:

estimator - the model to train and tune (**estimator object)**

param_grid - a grid of parameters over which the model is evaluated (**dict or list of dictionaries)

scoring - the evaluation metrics (**str, callable, list, tuple or dict, default=None)

cv - no of folds for cross-validation (**int, cross-validation generator or an iterable, default=None)

Let us look at its example codes

param_grid = {'max_depth':[1, 10, 100], 'min_samples_split':[5, 10, 100], 'min_samples_leaf':[5, 10, 100]}
tree_reg = DecisionTreeClassifier()
grid_search = GridSearchCV(tree_reg, param_grid, cv=5, scoring='neg_mean_squared_error', return_train_score=True)
grid_search.fit(x_train, y_train)

In above example, we have used Decision tree classifier for which we are tuning it with parameters max_depth, min_samples_split and min_samples_leaf. For the optimization of neg_mean_squared_error, we have used this as scoring metric. Then, we have trained training data by using fit function.

best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
print(best_params)
print(best_model)

GridSearchCV also provides two values for which we can know which is the best parameters list and the best model out of all present in the param_grid for GridSearchCV.

💡
If GridSearchCV is initialized with refit=True, then once it finds the best estimator using cross-validation, it retrains it on whole training set.

The model also used to give output values by predicting for test dataset using predict method and return the values.

y_pred = best_model.predict(x_test)
print(y_pred)

It contains the link for the sklearn GridSearchCV through which one can get more idea about it and can use it.

Link for sklearn GridSearchCV

Methods

decision_function(X)

Call decision_function on the estimator with the best found parameters.

fit(X[, y, groups])

Run fit with all sets of parameters.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

inverse_transform(Xt)

Call inverse_transform on the estimator with the best found params.

predict(X)

Call predict on the estimator with the best found parameters.

predict_log_proba(X)

Call predict_log_proba on the estimator with the best found parameters.

predict_proba(X)

Call predict_proba on the estimator with the best found parameters.

score(X[, y])

Return the score on the given data, if the estimator has been refit.

score_samples(X)

Call score_samples on the estimator with the best found parameters.

set_fit_request([, groups])

Request metadata passed to the fit method.

set_params(*params)

Set the parameters of this estimator.

transform(X)

Call transform on the estimator with the best found parameters.

GridSearch helps in automating the process of hyperparameter tuning and avoids the need for manual trial and error. It provides a systematic and efficient way to search for the optimal hyperparameter values for a given model and dataset.

However, it is important to note that GridSearch can be computationally expensive, especially when dealing with a large number of hyperparameters or a large dataset. In such cases, other techniques like RandomizedSearchCV or Bayesian optimization may be more suitable.

Overall, GridSearch is a valuable tool for finding the best hyperparameter values to improve the performance of machine learning models, leading to better generalization and more accurate predictions.

Thank you guys

That's all in this blog....

You can connect with me through

Akhil Soni

LinkedIn