Why is the negative logarithm?

Question

Why is the negative logarithm?

I applied sklearn log loss for logistic regression: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html

My code looks something like this:

def perform_cv(clf, X, Y, scoring): kf = KFold(X.shape[0], n_folds=5, shuffle=True) kf_scores = [] for train, _ in kf: X_sub = X[train,:] Y_sub = Y[train] #Apply 'log_loss' as a loss function scores = cross_validation.cross_val_score(clf, X_sub, Y_sub, cv=5, scoring='log_loss') kf_scores.append(scores.mean()) return kf_scores

However, I wonder why the resulting logarithmic losses are negative. I expect them to be positive since in the documentation (see My link above) the logarithmic loss is multiplied by -1 to turn it into a positive number.

Am I doing something wrong here?

+6

scikit-learn metric

toom Oct 9 '14 at 15:58

source share

4 answers

lanpa · Answer 1 · 2014-10-09T17:08:19+0000

a similar discussion can be found here .

Thus, a higher score means better performance (less loss).

AN6U5 · Answer 2 · 2014-12-08T18:42:01+0000

Yes, this is bound to happen. This is not a “mistake” as others have suggested. Actual log loss is simply the positive version of the number you get.

The SK-Learn API with a unified score always maximizes the score, so the points that need to be minimized are reset so that the unified scoring API works correctly. The returned result is therefore denied when it is an estimate that should be minimized and left positive if it is an estimate that should be maximum.

This is also described in sklearn GridSearchCV with Pipeline and scikit-learn cross-validation, negative values with a mean square error

akshat thakar · Answer 3 · 2015-08-03T14:52:12+0000

The loss logarithm should be near zero for a good forecasting algorithm, a large negative value will mean that the predictive analysis is disabled and needs to be rethought.

toom · Answer 4 · 2014-10-10T13:11:58+0000

I cross-checked the implementation of sklearn with several other methods. This seems to be a real mistake in the framework. Instead, consider the following code to calculate log loss:

 import scipy as sp def llfun(act, pred): epsilon = 1e-15 pred = sp.maximum(epsilon, pred) pred = sp.minimum(1-epsilon, pred) ll = sum(act*sp.log(pred) + sp.subtract(1,act)*sp.log(sp.subtract(1,pred))) ll = ll * -1.0/len(act) return ll

Also note that the sizes of act and pred must have column vectors Nx1.

Why is the negative logarithm?

More articles: