Mapping scikit-learn DecisionTreeClassifier.tree_.value for the predicted class

I am using scikit-learn DecissionTreeClassifier in a class 3 dataset. After I adjust the classifier, I refer to all leaf nodes of the tree_ attribute to get the number of instances that fall into this node for each class.

clf = tree.DecisionTreeClassifier(max_depth=5) clf.fit(X, y) # lets assume there is a leaf node with id 5 print clf.tree_.value[5] 

This will print:

 >>> array([[ 0., 1., 68.]]) 

but ... how do you know which position in this array belongs to the class? The classifier has the classes_ attribute, which is also a list.

 >>> clf.classes_ array(['CLASS_1', 'CLASS_2', 'CLASS_3'], dtype=object) 

Maybe index 1 in the array of values ​​corresponds to the class in index 1 of the array of classes, etc.?

+5
source share
2 answers

Asked about it on the scikit-learm mailing list, and my hunch was correct. It turns out that index 1 in the array of values ​​corresponds to the class in index 1 of the array of classes, etc.

+6
source

No, it's not clf.classes_, but clf.tree_.feature, which contain the index of the X column. And, if X is a Pandas DataFrame, X.columns contains the name. You can find more information in a similar question .

0
source

Source: https://habr.com/ru/post/1204067/


All Articles