I want to track the loss during training of a multiclass gradient gain classifier as a way to find out if retraining is taking place or not. Here is my code:
%matplotlib inline
import numpy as np
import matplotlib.pylab as plt
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
n_est = 100
clf = GradientBoostingClassifier(n_estimators=n_est, max_depth=3, random_state=2)
clf.fit(X_train, y_train)
test_score = np.empty(len(clf.estimators_))
for i, pred in enumerate(clf.staged_predict(X_test)):
test_score[i] = clf.loss_(y_test, pred)
plt.plot(np.arange(n_est) + 1, test_score, label='Test')
plt.plot(np.arange(n_est) + 1, clf.train_score_, label='Train')
plt.show()
However, I get the following value error:
ValueError Traceback (most recent call last)
<ipython-input-33-27194f883893> in <module>()
22 test_score = np.empty(len(clf.estimators_))
23 for i, pred in enumerate(clf.staged_predict(X_test)):
25 plt.plot(np.arange(n_est) + 1, test_score, label='Test')
26 plt.plot(np.arange(n_est) + 1, clf.train_score_, label='Train')
C:\Documents and Settings\Philippe\Anaconda\lib\site-packages\sklearn\ensemble\gradient_boosting.pyc in __call__(self, y, pred)
396 Y[:, k] = y == k
397
399 logsumexp(pred, axis=1))
400
ValueError: operands could not be broadcast together with shapes (45,3) (45)
I know this code works fine if I use GradientBoostingRegressor, but I can't figure out how to get it working with a multiclass classifier such as GradientBoostingClassifier. Thank you for your help.