Using Keras ImageDataGenerator in a regression model

I want to use

flow_from_directory 

method

 ImageDataGenerator 

to generate training data for a regression model, where the target value can be any floating point value from 1 to -1.

 flow_from_directory 

has a parameter "class_mode" with a handle

class_mode: one of “categorical”, “binary”, “sparse” or “not”. Default: Categorical. Defines the type of returned label arrays: “categorical” will be a two-dimensional single-line encoded label, “binary” will be 1D binary labels, “sparse” will be 1D integer labels.

Which of these values ​​should I take? None of them seem to fit ...

+6
source share
2 answers

Currently (the latest version of Keras since January 21, 2017) flow_from_directory can only work as follows:

  • You must have directories structured as follows:

     directory with images\ 1st label\ 1st picture from 1st label 2nd picture from 1st label 3rd picture from 1st label ... 2nd label\ 1st picture from 2nd label 2nd picture from 2nd label 3rd picture from 2nd label ... ... 
  • flow_from_directory returns batches of a fixed size in the format (picture, label) .

So, as you can see, it can be used only for the case of classification, and all the parameters presented in the documentation indicate only the way the class is provided to your classifier. But there is a neat hack that can make flow_from_directory useful for the regression task:

  • You need to structure your directory as follows:

     directory with images\ 1st value (eg -0.95423)\ 1st picture from 1st value 2nd picture from 1st value 3rd picture from 1st value ... 2nd value (eg - 0.9143242)\ 1st picture from 2nd value 2nd picture from 2nd value 3rd picture from 2nd value ... ... 
  • You also need to have a list_of_values = [1st value, 2nd value, ...] list. Then your generator is defined as follows:

     def regression_flow_from_directory(flow_from_directory_gen, list_of_values): for x, y in flow_from_directory_gen: yield x, list_of_values[y] 

And it is important that flow_from_directory_gen class_mode='sparse' to make this work. Of course, this is a little cumbersome, but it works (I used this solution :))

+4
source

I think organizing your data differently using a DataFrame (without having to move your images to new locations) will allow you to run a regression model. In short, create columns in your DataFrame containing the file path of each image and the target value. This allows your generator to maintain the correct synchronization of regression values ​​and images, even if you shuffle your data in each era.

Here is an example showing how to associate images with binomial goals, multi-minimum goals, and regression goals to show that “the goal is the goal is the goal” and only the model can change:

 df['path'] = df.object_id.apply(file_path_from_db_id) df object_id bi multi path target index 0 461756 dog white /path/to/imgs/756/61/blah_461756.png 0.166831 1 1161756 cat black /path/to/imgs/756/61/blah_1161756.png 0.058793 2 3303651 dog white /path/to/imgs/651/03/blah_3303651.png 0.582970 3 3367756 dog grey /path/to/imgs/756/67/blah_3367756.png -0.421429 4 3767756 dog grey /path/to/imgs/756/67/blah_3767756.png -0.706608 5 5467756 cat black /path/to/imgs/756/67/blah_5467756.png -0.415115 6 5561756 dog white /path/to/imgs/756/61/blah_5561756.png -0.631041 7 31255756 cat grey /path/to/imgs/756/55/blah_31255756.png -0.148226 8 35903651 cat black /path/to/imgs/651/03/blah_35903651.png -0.785671 9 44603651 dog black /path/to/imgs/651/03/blah_44603651.png -0.538359 10 49557622 cat black /path/to/imgs/622/57/blah_49557622.png -0.295279 11 58164756 dog grey /path/to/imgs/756/64/blah_58164756.png 0.407096 12 95403651 cat white /path/to/imgs/651/03/blah_95403651.png 0.790274 13 95555756 dog grey /path/to/imgs/756/55/blah_95555756.png 0.060669 

I describe how to do this in great detail with examples here:

https://techblog.appnexus.com/a-keras-multithreaded-dataframe-generator-for-millions-of-image-files-84d3027f6f43

+2
source

Source: https://habr.com/ru/post/1014239/


All Articles