Is there a way to get samples under each leaf of the decision tree?

I trained the decision tree using a dataset. Now I want to see which samples fall under a leaf of a tree.

From here I want the red circled patterns.

enter image description here

I am using the implementation of the Python Sklearn decision tree.

+4
source share
1 answer

If you want only a sheet for each sample, you can simply use

clf.apply(iris.data)

([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5,         5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,         5, 5, 14, 5, 5, 5, 5, 5, 5, 10, 5, 5, 5, 5, 5, 10, 5,         5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 16, 16,        16, 16, 16, 16, 6, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,         8, 16, 16, 16, 16, 16, 16, 15, 16, 16, 11, 16, 16, 16, 8, 8, 16,        16, 16, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16])

node,

dec_paths = clf.decision_path(iris.data)

, toarray() , node . defaultdict, - node, - .

for d, dec in enumerate(dec_paths):
    for i in range(clf.tree_.node_count):
        if dec.toarray()[0][i]  == 1:
            samples[i].append(d)

import sklearn.datasets
import sklearn.tree
import collections

clf = sklearn.tree.DecisionTreeClassifier(random_state=42)
iris = sklearn.datasets.load_iris()
clf = clf.fit(iris.data, iris.target)

samples = collections.defaultdict(list)
dec_paths = clf.decision_path(iris.data)

for d, dec in enumerate(dec_paths):
    for i in range(clf.tree_.node_count):
        if dec.toarray()[0][i]  == 1:
            samples[i].append(d) 

print(samples[13])

[70, 126, 138]

+4

Source: https://habr.com/ru/post/1682603/


All Articles