Separate the data catalog into the training and test catalog with the saved subdirectory structure

Question

Separate the data catalog into the training and test catalog with the saved subdirectory structure

I am interested in using ImageDataGenerator in Keras to increase data. But this requires that the training and validation catalogs with subdirectories for classes be loaded separately, as shown below (this is from the Keras documentation). I have one directory with 2 subdirectories for 2 classes (Data / Class1 and Data / Class2). How I accidentally broke it into training and verification catalogs.

train_datagen = ImageDataGenerator( rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) test_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory( 'data/train', target_size=(150, 150), batch_size=32, class_mode='binary') validation_generator = test_datagen.flow_from_directory( 'data/validation', target_size=(150, 150), batch_size=32, class_mode='binary') model.fit_generator( train_generator, steps_per_epoch=2000, epochs=50, validation_data=validation_generator, validation_steps=800)

I am interested in restarting my algorithm several times with a random separation of training and validation.

+5

keras

Sharanya arcot desai Oct 12 '17 at 19:47

source share

4 answers

Sharanya arcot desai · Answer 1 · 2017-10-20T21:54:57+0000

Thanks guys! I was able to write my own function to create training and test data sets. Here is the code for those who are watching.

 import os source1 = "/source_dir" dest11 = "/dest_dir" files = os.listdir(source1) import shutil import numpy as np for f in files: if np.random.rand(1) < 0.2: shutil.move(source1 + '/'+ f, dest11 + '/'+ f)

Marcin Możejko · Answer 2 · 2017-10-14T14:39:12+0000

Unfortunately, this is not possible for the current implementation of keras.preprocessing.image.ImageDataGenerator (as of October 14, 2017), but since this is a really requested feature, I expect it to be added soon.

But you can do this using standard Python os operations. Depending on the size of your dataset, you can also first load all the images into RAM , and then use the classic fit method, which can split your data randomly.

blackHoleDetector · Answer 3 · 2017-10-12T20:11:27+0000

You need to either manually copy some training data and paste it into the verification directory, or create a program for accidentally moving data from the training directory to your verification directory. Using any of these parameters, you will need to pass ImageDataGenerator().flow_from_directory() as the path to the validation directory.

Details for organizing your data in the directory structure are described in this video .

Richard · Answer 4 · 2018-03-14T13:55:58+0000

Here is my approach:

 # Create temporary validation set. with TemporaryDirectory(dir=train_image_folder) as valid_image_folder, TemporaryDirectory(dir=train_label_folder) as valid_label_folder: train_images = os.listdir(train_image_folder) train_labels = os.listdir(train_label_folder) for img_name in train_images: single_name, ext = os.path.splitext(img_name) label_name = single_name + '.png' if label_name not in train_labels: continue if random.uniform(0, 1) <= train_val_split: # Move the files. shutil.move(os.path.join(train_image_folder, img_name), os.path.join(valid_image_folder, img_name)) shutil.move(os.path.join(train_label_folder, label_name), os.path.join(valid_label_folder, img_name))

Remember to move everything.

Separate the data catalog into the training and test catalog with the saved subdirectory structure

More articles: