How to get class labels from cross_val_predict used with pred_proba in scikit-learn

I need to train a random forest classifier using triple cross validation. For each sample, I need to get the probability of the forecast when it will be in the test set.

I am using scikit-learn version 0.18.dev0.

This new version adds this function to the use of the cross_val_predict () method with an additional parameter methodto determine what kind of forecast is required from the evaluator.

In my case, I want to use the predict_proba () method , which returns the probability for each class in a multiclass scenario.

However, when I run the method, I end up with a prediction probability matrix, where each row represents a sample, and each column represents a prediction probability for a particular class.

The problem is that the method does not indicate which class corresponds to each column.

I need the same (in my case using RandomForestClassifier) returned in attribute_ classes, defined as:

classes_: array of shape = [n_classes] or a list of such arrays Class labels (single output problem) or a list of class label arrays (multiple output problem).

which is required predict_proba()because its documentation says that:

The order of the classes corresponds to the attribute class _.

Minimal example:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict

clf = RandomForestClassifier()

X = np.random.randn(10, 10)
y = y = np.array([1] * 4 + [0] * 3 + [2] * 3)

# how to get classes from here?
proba = cross_val_predict(estimator=clf, X=X, y=y, method="predict_proba")

# using the classifier without cross-validation
# it is possible to get the classes in this way:
clf.fit(X, y)
proba = clf.predict_proba(X)
classes = clf.classes_
+4
1

, ; , DecisionTreeClassifier ( base_estimator RandomForestClassifier) np.unique classes_, .

+2

Source: https://habr.com/ru/post/1653207/


All Articles