Strategies of Multiclass Classification

Strategies of Multiclass Classification

Binary classifier is ready!!

In this page, we will know something about classification section of supervised learning type in machine learning domain of Artificial Intelligence. Classification is a supervised learning task as it predicts class for given data values. There are many classifiers models and algorithms that can make classification for given data.

There are many binary classifiers like SGDClassifier, or Stochastic Gradient Descent (SGD) classifier which is used to distinguish between just two classes. Some other binary classifiers are Support Vector Machine classifiers or Linear classifiers.

These binary classifiers are used to implement multiclass classification. There are some algorithms and classifiers that are capable of handling multiple classes directly which are Random Forest Classifiers or naive Bayes Classifiers.

Multiclass Classification

The classification of data by distinguishing between more than two classes is called multiclass classification.

Multiclass classifier has the capability to distinguish between more than two classes, also called as multinomial classifiers. Now there are various strategies that one can use to perform multiclass classification using multiple binary classifiers.

Binary classifier works by looking whether the given data is the correct or not correct. It only identifies one data may be image or text.

For example, we can say that there are images of digits and we need to identify whether the given image is of 0 or not. So it is a 0-detector. Similarly for 10 digits from 0–9, we need 10 binary classifiers, one for each digit. We will select that class whose classifier outputs the highest score.

Here we have two strategies for multiclass classification

One versus all strategy

This technique is used in multiclass classification in which each class is treated as a binary classificationo task against all other classes combined. Here for a given sample input, it passes through each binary classifier and the class with highest probability or score is predicted as final class for input sample. Also known as one versus rest technique.

One versus one strategy

This is a classification strategy used to learn to distinguish between the instances of one class and the instances of other class in the pair, here training is done for pair of classes and for a sample instance, class with most votes across all binary classifiers is assigned as final predicted class.

Here number of classifiers for N classes = N*(N-1)/2

OVA is preferred mostly for all classifiers but OVO is used for support vector machine as due to size of training set. OVO is preferred as it is faster to train many classifiers on small training sets than training few classifiers on large training sets.

Scikit-learn provides OneVsOneClassifier for one versus one strategy and OneVsRestClassifier for one versus all strategy. We just need to create an instance and pass a binary classifier to its constructor.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import LinearSVC
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=True, random_state=0)
clf = OneVsOneClassifier(LinearSVC(dual="auto", random_state=0)).fit(X_train, y_train)
clf.predict(X_test[:10])

However, the method of one versus one classifier may be advantageous for algorithms such as kernel algorithms which don’t scale well with the number of samples. This is because each individual learning problem only involves a small subset of the data whereas, with one-vs-the-rest, the complete dataset is used number of classes times.

Here the complexity of one versus one classifier is O(n²) where n is number of classes.

Methods of OneVsOneClassifier

decision_function(X)

Decision function for the OneVsOneClassifier.

fit(X, y)

Fit underlying estimators.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

partial_fit(X, y[, classes])

Partially fit underlying estimators.

predict(X)

Estimate the best class label for each sample in X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

set_partial_fit_request(*[, classes])

Request metadata passed to the partial_fit method.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

OneVsRestClassifier can also be used for multilabel classification. This estimator uses the binary relevance method to perform multilabel classification, which involves training one binary classifier independently for each label. For each classifier, the class is fitted against all the other classes. In addition to its computational efficiency (only number of classifiers are same as number of classes), one advantage of this approach is its interpretability. Since each class is represented by one and one classifier only, it is possible to gain knowledge about the class by inspecting its corresponding classifier. This is the most commonly used strategy for multiclass classification and is a fair default choice.

import numpy as np
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
X = np.array([
     [10, 10],
     [8, 10],
     [-5, 5.5],
     [-5.4, 5.5],
     [-20, -20],
     [-15, -20]
])
y = np.array([0, 0, 1, 1, 2, 2])
clf = OneVsRestClassifier(SVC()).fit(X, y)
clf.predict([[-19, -20], [9, 9], [-5, 5]])

One can find the implementation and explanation of OneVsRestClassifier here as it is provided by sklearn.

https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass.OneVsRestClassifier

Methods of OneVsRestClassifier

decision_function(X)

Decision function for the OneVsRestClassifier.

fit(X, y)

Fit underlying estimators.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

partial_fit(X, y[, classes])

Partially fit underlying estimators.

predict(X)

Predict multi-class targets using underlying estimators.

predict_proba(X)

Probability estimates.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

set_partial_fit_request(*[, classes])

Request metadata passed to the partial_fit method.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

These are the two strategies along with their relevant use cases and advantages and applications in certain conditions along with their implementation which would be useful.

That’s it here!

Thank you guys!

-Akhil Soni

You can connect with me through linkedin

linkedin.com/in/akhil-soni-9827181a1