Keras' model.fit_generator () `behaves differently than` model.fit () `

I have a huge dataset that I have to provide Keras in the form of a generator, because it does not fit in memory. However, using fit_generator , I cannot reproduce the results obtained during a regular workout with model.fit . Also, each era lasts much longer.

I implemented a minimal example. Maybe someone can show me where the problem is.

 import random import numpy from keras.layers import Dense from keras.models import Sequential random.seed(23465298) numpy.random.seed(23465298) no_features = 5 no_examples = 1000 def get_model(): network = Sequential() network.add(Dense(8, input_dim=no_features, activation='relu')) network.add(Dense(1, activation='sigmoid')) network.compile(loss='binary_crossentropy', optimizer='adam') return network def get_data(): example_input = [[float(f_i == e_i % no_features) for f_i in range(no_features)] for e_i in range(no_examples)] example_target = [[float(t_i % 2)] for t_i in range(no_examples)] return example_input, example_target def data_gen(all_inputs, all_targets, batch_size=10): input_batch = numpy.zeros((batch_size, no_features)) target_batch = numpy.zeros((batch_size, 1)) while True: for example_index, each_example in enumerate(zip(all_inputs, all_targets)): each_input, each_target = each_example wrapped = example_index % batch_size input_batch[wrapped] = each_input target_batch[wrapped] = each_target if wrapped == batch_size - 1: yield input_batch, target_batch if __name__ == "__main__": input_data, target_data = get_data() g = data_gen(input_data, target_data, batch_size=10) model = get_model() model.fit(input_data, target_data, epochs=15, batch_size=10) # 15 * (1000 / 10) * 10 # model.fit_generator(g, no_examples // 10, epochs=15) # 15 * (1000 / 10) * 10 

On my computer, model.fit always ends the 10th era with a loss of 0.6939 and after approx. 2-3 seconds

However, the model.fit_generator method works much longer and completes the last era with a different loss ( 0.6931 ).

I don’t understand why the results in both approaches are different. It may not seem like a big difference, but I have to be sure that the same data with the same network gives the same result, regardless of the usual training or using the generator.

Update: @Alex R. gave an answer to part of the original problem (some performance problems, as well as changing the results at each start). However, since the main problem remains, I just adjusted the question and the name accordingly.

+7
source share
6 answers

I do not understand how a loss can be unstable with a larger batch size, since there should be less fluctuation with large batches. However, looking at the Keras documentation , the fit() procedure looks like this:

 fit(self, x, y, batch_size=32, epochs=10, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0) 

which has a default batch_size=32 and epochs=10 . If fit_generator() looks like this:

 fit_generator(self, generator, steps_per_epoch, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, initial_epoch=0) 

In particular, "step_per_epoch" is defined:

steps_per_epoch: the total number of steps (batches of samples) to receive from the generator before declaring one epoch finished and starting the next epoch. Usually it should be equal to the number of unique samples of your data set, divided by batch size.

So, for starters, this sounds like your fit_generator accepts more samples compared to your fit (). See here for more details .

+4
source

Lot sizes

  • In fit you use the standard batch size = 32.
  • In fit_generator you use batch size = 10.

Keras probably launches weight updates after each batch, so if you use lots of different sizes, there is a chance of getting different gradients between the two methods. And as soon as there is another weight update, both models will never be able to meet again.

Try using fit with batch_size=10 or use a generator with batch_size=32 .


Seed problem?

Are you creating a new model with get_model() for each case?

If so, the weights in both models are different, and, of course, you will have different results for both models. (Well, you installed the seed, but if you use shadoworflow, you may have run into this problem )

Ultimately, they will sort the rapprochement. The difference between them is not so much.


Data validation

If you are not sure that your generator gives the same data as you expect, do a simple loop and print / compare / check the data it gives:

 for i in range(numberOfBatches): x,y = g.next() #or next(g) #print or compare x,y here. 

+2
source

Be sure to shuffle your parts in your generator.

In this discussion, it is proposed to enable shuffling in your iterator: https://github.com/keras-team/keras/issues/2389 . I had the same problem and that solved it.

+1
source

As for the loss, perhaps this is due to the difference in the size of the party, which has already been discussed.

Regarding the difference in training time, model.fit_generator() allows you to specify the number of "employees". This setting refers to how many instances of your model are learning in different areas of your dataset at the same time. If your computer architecture is optimized correctly, you should change the operating parameter to 4 or 8 and see a significant reduction in training time.

0
source

Hope I'm not late for the party. The most important thing I would like to add:

In Keras, using fit() good for small datasets that can be loaded into memory. In most cases, almost all data sets are large and cannot be loaded into memory at the same time.

For large datasets, we must use fit_generator() .

0
source

There are more differences between fit and fit.generator than it seems to fit.generator look. In any case, these two functions are interchangeable, regardless of what the Keras team wants you to believe. The way they update gradients can be the cause of their different behavior.

In any case, due to the real problems of neural networks , using fit() useless. If your problem can be solved with fit() then it either belongs to the classroom or is a test run. Otherwise, you probably need more data collection.

0
source

Source: https://habr.com/ru/post/1271340/


All Articles