How to determine if overriding is performed in a multiclass classifier

Question

How to determine if overriding is performed in a multiclass classifier

I want to track the loss during training of a multiclass gradient gain classifier as a way to find out if retraining is taking place or not. Here is my code:

%matplotlib inline
import numpy as np
#import matplotlib.pyplot as plt
import matplotlib.pylab as plt
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

n_est = 100
clf = GradientBoostingClassifier(n_estimators=n_est, max_depth=3, random_state=2)
clf.fit(X_train, y_train)


test_score = np.empty(len(clf.estimators_))
for i, pred in enumerate(clf.staged_predict(X_test)):
    test_score[i] = clf.loss_(y_test, pred)
plt.plot(np.arange(n_est) + 1, test_score, label='Test')
plt.plot(np.arange(n_est) + 1, clf.train_score_, label='Train')
plt.show()

However, I get the following value error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-27194f883893> in <module>()
     22 test_score = np.empty(len(clf.estimators_))
     23 for i, pred in enumerate(clf.staged_predict(X_test)):
---> 24     test_score[i] = clf.loss_(y_test, pred)
     25 plt.plot(np.arange(n_est) + 1, test_score, label='Test')
     26 plt.plot(np.arange(n_est) + 1, clf.train_score_, label='Train')

C:\Documents and Settings\Philippe\Anaconda\lib\site-packages\sklearn\ensemble\gradient_boosting.pyc in __call__(self, y, pred)
    396             Y[:, k] = y == k
    397 
--> 398         return np.sum(-1 * (Y * pred).sum(axis=1) +
    399                       logsumexp(pred, axis=1))
    400 

ValueError: operands could not be broadcast together with shapes (45,3) (45)

I know this code works fine if I use GradientBoostingRegressor, but I can't figure out how to get it working with a multiclass classifier such as GradientBoostingClassifier. Thank you for your help.

+4

python-2.7 scikit-learn

user3329302 May 6, '14 at 15:54

source share

2 answers

- . knn.

# Setup arrays to store train and test accuracies
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

# Loop over different values of k
for i, k in enumerate(neighbors):
  # Setup a k-NN Classifier with k neighbors: knn
  knn = KNeighborsClassifier(n_neighbors=k)

  # Fit the classifier to the training data
  knn.fit(X_train, y_train)

  #Compute accuracy on the training set
  train_accuracy[i] = knn.score(X_train, y_train)

  #Compute accuracy on the testing set
  test_accuracy[i] = knn.score(X_test, y_test)

# Generate plot
plt.title('k-NN: Varying Number of Neighbors')
plt.plot(neighbors, test_accuracy, label = 'Testing Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training Accuracy')
plt.legend()
plt.xlabel('Number of Neighbors')
plt.ylabel('Accuracy')
plt.show()

0

sera 10 . '17 13:15

mbatchkarov · Accepted Answer · 2014-05-07T13:57:23+0000

, loss_ n_samples, k, staged_predict [n_samples] ( ). , staged_predict_proba staged_decision_function loss_.

, , :

for i, pred in enumerate(clf.staged_decision_function(X_test)):
    test_score[i] = clf.loss_(y_test, pred)

for i, pred in enumerate(clf.staged_decision_function(X_train)):
    train_score[i] = clf.loss_(y_train, pred)

plot(test_score)
plot(train_score)
legend(['test score', 'train score'])

, loss_, . , :

How to determine if overriding is performed in a multiclass classifier

More articles: