DecisionTreeClassifier
is certainly capable of classifying multiclasses. βMore thanβ is simply illustrated in this link, but coming to this decision rule is a consequence of the influence it has on obtaining information or on gini ( see below on this page ). Decision tree nodes usually have binary rules, so they usually take the form of some value that is larger than the other. The trick converts your data, so it has good predictive values ββfor comparison.
To be clear, multiclass means that your data (like a document) should be classified as one of many possible classes. This differs from multilabel classification, where a document must be classified with several classes from a set of possible classes. Most scikit-learn classifiers support multiclass, and it has several meta-wrappers for multitasking. You can also use probabilities (models with the predict_proba
method) or decision distances (models with the decision_function
method) for multitasking.
If you say that you need to apply several labels to each anchor point (for example, ['red', 'sport', 'fast'] for cars), then you need to create a unique label for each possible combination to use trees / forest, which becomes your [0 ... K-1] set of classes. However, this means that there is some predictive correlation in the data (for the combined color, type, and speed in the car example). For cars, there may be red / yellow, fast sports cars, but unlikely for other 3-way combinations. Data can be predictive for those few and very weak for others. It is better to use SVM or LinearSVC and / or wrap using OneVsRestClassifier or similar.
source share