Keras flow_from_directory over or undersample class

I am trying to perform a binary classification problem with Keras using the ImageDataGenerator.flow_from_directory method to create packages. However, my classes are very unbalanced, for example, about 8x or 9x more in one class than others, causing the model to get stuck, predicting the same output class for each example. Is there a way to set flow_from_directory to either over- flow_from_directory my small class, or limited to my large class in every era? At the moment, I just created several copies of each image in my smaller class, but I would like to have a little more flexibility.

+5
source share
2 answers

With the current version of Keras, it is not possible to balance your dataset using only the built-in Keras methods. flow_from_directory simply creates a list of all files and their classes, shuffling it (if necessary), and then iterating over it.

But you could do another trick - by writing your own generator that will do the balancing inside python :

 def balanced_flow_from_directory(flow_from_directory, options): for x, y in flow_from_directory: yield custom_balance(x, y, options) 

Here custom_balance should be a function that provides the package (x, y) , balances it and returns a balanced batch (x', y') . For most applications, the lot size does not have to be the same - but there are some weird use cases (e.g. stateful RNN) - where the lot sizes should have a fixed size).

+6
source

You can also calculate the number of files in each class and normalize class_weights

 files_per_class = [] for folder in os.listdir(input_foldr): if not os.path.isfile(folder): files_per_class.append(len(os.listdir(input_foldr + '/' + folder))) total_files = sum(files_per_class) class_weights = {} for i in xrange(len(files_per_class)): class_weights[i] = 1 - (float(files_per_class[i]) / total_files) print (class_weights) ... ... ... model.fit_generator(... ,class_weight=class_weights) 
0
source

Source: https://habr.com/ru/post/1263220/


All Articles