Keras: class weight (class_weight) for single encoding

I would like to use the class_weight argument in keras model.fit to handle unbalanced training data. Having looked at some documents, I realized that we can use such a dictionary:

class_weight = {0 : 1, 1: 1, 2: 5} 

(In this example, class-2 will receive a higher penalty in the loss function.)

The problem is that my network output has one hot coding, i.e. class-0 = (1, 0, 0), class-1 = (0, 1, 0) and class-3 = (0, 0, 1 )

How can we use class_weight for one hot coded output?

After looking at some codes in Keras , it looks like _feed_output_names contains a list of output classes, but in my case model.output_names / model._feed_output_names returns ['dense_1']

Related: How to set class weight for unbalanced classes in Keras?

+5
source share
3 answers

I think we can use sample_weights instead. Inside Keras, class_weights are actually converted to sample_weights .

sample_weight: an optional array of the same length as x, containing weights to apply to the loss of model for each sample. In the case of temporary data, you can pass a 2D array with a shape (fetch, sequence_length) to apply a different weight to each temporary value of each sample. In this case, you should definitely specify sample_weight_mode = "temporal" in compile ().

https://github.com/fchollet/keras/blob/d89afdfd82e6e27b850d910890f4a4059ddea331/keras/engine/training.py#L1392

+2
source

A bit confusing answer, but the best I have found so far. This assumes that your data is one-time, multi-class and only works with DataFrame df_y :

 import pandas as pd import numpy as np # Create a pd.series that represents the categorical class of each one-hot encoded row y_classes = df_y.idxmax(1, skipna=False) from sklearn.preprocessing import LabelEncoder # Instantiate the label encoder le = LabelEncoder() # Fit the label encoder to our label series le.fit(list(y_classes)) # Create integer based labels Series y_integers = le.transform(list(y_classes)) # Create dict of labels : integer representation labels_and_integers = dict(zip(y_classes, y_integers)) from sklearn.utils.class_weight import compute_class_weight, compute_sample_weight class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers) sample_weights = compute_sample_weight('balanced', y_integers) class_weights_dict = dict(zip(le.transform(list(le.classes_)), class_weights)) 

This results in a sample_weights vector calculated for balancing an unbalanced dataset that can be passed to the Keras property sample_weight , and a class_weights_dict that can be passed to the Keras property class_weight in the .fit method. You really do not want to use both options, just select one. I am using class_weight right now because it makes it difficult for sample_weight to work with fit_generator .

+4
source

in _standardize_weights , keras does:

 if y.shape[1] > 1: y_classes = y.argmax(axis=1) 

so basically, if you decide to use single-string encoding, classes are the index of the column.

You may also ask yourself how you can map the column index to the source classes of your data. Well, if you use the LabelEncoder scikit class, which learns to perform single-string coding, the column index displays the unique labels order calculated by the .fit function. Doc says

Retrieve an ordered array of unique shortcuts

Example:

 from sklearn.preprocessing import LabelBinarizer y=[4,1,2,8] l=LabelBinarizer() y_transformed=l.fit_transorm(y) y_transormed > array([[0, 0, 1, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1]]) l.classes_ > array([1, 2, 4, 8]) 

As a conclusion, the keys of the class_weights dictionary should reflect the order in the classes_ attribute of the encoder.

0
source

Source: https://habr.com/ru/post/1266881/


All Articles