Keras Data Enlargement Options

I read some material about increasing data in Keras, but for me it's still a bit vague. Is there any parameter for controlling the number of images created from each input image during the data enlargement stage? In this example, I do not see any parameter that controls the number of images created from each image.

For example, in the code below, I can have a parameter ( num_imgs ) to control the number of images created from each input image, and be stored in a folder called preview; but when you increase real-time data, there are no parameters for this purpose.

 from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img num_imgs = 20 datagen = ImageDataGenerator( rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest') img = load_img('data/train/cats/cat.0.jpg') # this is a PIL image x = img_to_array(img) # this is a Numpy array with shape (3, 150, 150) x = x.reshape((1,) + x.shape) # this is a Numpy array with shape (1, 3, 150, 150) # the .flow() command below generates batches of randomly transformed images # and saves the results to the `preview/` directory i = 0 for batch in datagen.flow(x, batch_size=1, save_to_dir='preview', save_prefix='cat', save_format='jpeg'): i += 1 if i > num_imgs: break # otherwise the generator would loop indefinitely 
+5
source share
2 answers

The data improvement works as follows: at each educational era, transformations with randomly selected parameters in the specified range are applied to all original images in the training set. After the era is completed, that is, after you set the learning algorithm for the entire set of training data, the next training era begins, and the training data is again supplemented by applying these transformations to the source training data.

Thus, the number of times each image is supplemented is equal to the number of learning eras. Recall the form of the example you linked :

 # Fit the model on the batches generated by datagen.flow(). model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), samples_per_epoch=X_train.shape[0], nb_epoch=nb_epoch, validation_data=(X_test, Y_test)) 

Here the datagen object will expose the training set by model nb_epoch times, so each image will be enlarged nb_epoch times. Thus, the learning algorithm almost never sees two exactly the same training examples, because in each era, training examples are randomly transformed.

+7
source

Basically, how it works, it generates only one image for each input image, after all the input images have been generated once, it will start again.

In your example, since there is only one input image, it will generate different versions of this image until there are twenty.

You can see the source code here https://github.com/fchollet/keras/blob/master/keras/preprocessing/image.py

+3
source

Source: https://habr.com/ru/post/1261356/


All Articles