Why do I need to make two trains for fine tuning InceptionV3 in Keras?

I don’t understand why I have to call the fit() / fit_generator() function twice to fine tune InceptionV3 (or any other pre-processed model) in Keras (version 2.0.0). The documentation suggests the following:

Fine-tuning InceptionV3 in a new set of classes

 from keras.applications.inception_v3 import InceptionV3 from keras.preprocessing import image from keras.models import Model from keras.layers import Dense, GlobalAveragePooling2D from keras import backend as K # create the base pre-trained model base_model = InceptionV3(weights='imagenet', include_top=False) # add a global spatial average pooling layer x = base_model.output x = GlobalAveragePooling2D()(x) # let add a fully-connected layer x = Dense(1024, activation='relu')(x) # and a logistic layer -- let say we have 200 classes predictions = Dense(200, activation='softmax')(x) # this is the model we will train model = Model(input=base_model.input, output=predictions) # first: train only the top layers (which were randomly initialized) # ie freeze all convolutional InceptionV3 layers for layer in base_model.layers: layer.trainable = False # compile the model (should be done *after* setting layers to non-trainable) model.compile(optimizer='rmsprop', loss='categorical_crossentropy') # train the model on the new data for a few epochs model.fit_generator(...) # at this point, the top layers are well trained and we can start fine-tuning # convolutional layers from inception V3. We will freeze the bottom N layers # and train the remaining top layers. # let visualize layer names and layer indices to see how many layers # we should freeze: for i, layer in enumerate(base_model.layers): print(i, layer.name) # we chose to train the top 2 inception blocks, ie we will freeze # the first 172 layers and unfreeze the rest: for layer in model.layers[:172]: layer.trainable = False for layer in model.layers[172:]: layer.trainable = True # we need to recompile the model for these modifications to take effect # we use SGD with a low learning rate from keras.optimizers import SGD model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy') # we train our model again (this time fine-tuning the top 2 inception blocks # alongside the top Dense layers model.fit_generator(...) 

Why don't we call fit() / fit_generator() only once? As always, thanks for your help!

ED i T:

Below are the answers of Nassim Ben and David de la Iglesia. I would highly recommend the link provided by David de la Iglesia: Transfer Learning

+5
source share
2 answers

InceptionV3 is a very deep and complex network, it has been trained to recognize some things, but you use it for another classification task. This means that when you use it, it is not perfect for what you are doing.

Thus, the goal that they want to achieve here is to use some functions that have already been studied by the trained network and slightly change the top of the network (the highest level functions closest to your task).

So they removed the topmost layer and added a few more new and unprepared ones. They wanted to prepare this large model for their task, using extracting the function from the 172 first layers and studying the last, which will be adapted to your task.

In the part that they want to train, there is one part with the parameters already learned, and the other with new, randomly initialized parameters. The fact is that the layers that have already been studied, you just want to fine-tune them, and not retrain them from scratch ... The model does not have the ability to distinguish between the layers that it should simply push away and the layers that should be fully studied. If you only approach the layers of the model [172:], you will lose the interesting functions learned on the huge data set of the magnet. You do not want this, so what you do:

  • Learn the "good enough" last layers, setting all the initial V3, so as not to learn, this will give a good result.
  • The layers that have been trained will be good, and if you “defrost” some of the upper layers, they will not be too indignant, they will only be tuned exactly as you want.

So, to summarize, when you want to train a mixture of "already learned" layers with new layers, you update new ones and then train around to fine-tune them.

+3
source

If you add 2 layers with random initializations over the already configured convection, and you try to plan some convolutional layers without “warming up” the new layers, the high gradients of these new layers will explode the (useful) things learned by these convolutional layers.

That's why your first fit only trains these two new layers, using pre-prepared convolutes, like some kind of “fixed” capability extractor.

After that, your 2 dense layers do not have high gradients, and you can finish some of the pre-prepared convolutional layers. This is what you do on the second fit .

+2
source

Source: https://habr.com/ru/post/1265619/


All Articles