Separate the data catalog into the training and test catalog with the saved subdirectory structure

I am interested in using ImageDataGenerator in Keras to increase data. But this requires that the training and validation catalogs with subdirectories for classes be loaded separately, as shown below (this is from the Keras documentation). I have one directory with 2 subdirectories for 2 classes (Data / Class1 and Data / Class2). How I accidentally broke it into training and verification catalogs.

train_datagen = ImageDataGenerator( rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) test_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory( 'data/train', target_size=(150, 150), batch_size=32, class_mode='binary') validation_generator = test_datagen.flow_from_directory( 'data/validation', target_size=(150, 150), batch_size=32, class_mode='binary') model.fit_generator( train_generator, steps_per_epoch=2000, epochs=50, validation_data=validation_generator, validation_steps=800) 

I am interested in restarting my algorithm several times with a random separation of training and validation.

+5
source share
4 answers

Thanks guys! I was able to write my own function to create training and test data sets. Here is the code for those who are watching.

 import os source1 = "/source_dir" dest11 = "/dest_dir" files = os.listdir(source1) import shutil import numpy as np for f in files: if np.random.rand(1) < 0.2: shutil.move(source1 + '/'+ f, dest11 + '/'+ f) 
+4
source

Unfortunately, this is not possible for the current implementation of keras.preprocessing.image.ImageDataGenerator (as of October 14, 2017), but since this is a really requested feature, I expect it to be added soon.

But you can do this using standard Python os operations. Depending on the size of your dataset, you can also first load all the images into RAM , and then use the classic fit method, which can split your data randomly.

+2
source

You need to either manually copy some training data and paste it into the verification directory, or create a program for accidentally moving data from the training directory to your verification directory. Using any of these parameters, you will need to pass ImageDataGenerator().flow_from_directory() as the path to the validation directory.

Details for organizing your data in the directory structure are described in this video .

+1
source

Here is my approach:

 # Create temporary validation set. with TemporaryDirectory(dir=train_image_folder) as valid_image_folder, TemporaryDirectory(dir=train_label_folder) as valid_label_folder: train_images = os.listdir(train_image_folder) train_labels = os.listdir(train_label_folder) for img_name in train_images: single_name, ext = os.path.splitext(img_name) label_name = single_name + '.png' if label_name not in train_labels: continue if random.uniform(0, 1) <= train_val_split: # Move the files. shutil.move(os.path.join(train_image_folder, img_name), os.path.join(valid_image_folder, img_name)) shutil.move(os.path.join(train_label_folder, label_name), os.path.join(valid_label_folder, img_name)) 

Remember to move everything.

0
source

Source: https://habr.com/ru/post/1272569/


All Articles