InceptionV3 is a very deep and complex network, it has been trained to recognize some things, but you use it for another classification task. This means that when you use it, it is not perfect for what you are doing.
Thus, the goal that they want to achieve here is to use some functions that have already been studied by the trained network and slightly change the top of the network (the highest level functions closest to your task).
So they removed the topmost layer and added a few more new and unprepared ones. They wanted to prepare this large model for their task, using extracting the function from the 172 first layers and studying the last, which will be adapted to your task.
In the part that they want to train, there is one part with the parameters already learned, and the other with new, randomly initialized parameters. The fact is that the layers that have already been studied, you just want to fine-tune them, and not retrain them from scratch ... The model does not have the ability to distinguish between the layers that it should simply push away and the layers that should be fully studied. If you only approach the layers of the model [172:], you will lose the interesting functions learned on the huge data set of the magnet. You do not want this, so what you do:
- Learn the "good enough" last layers, setting all the initial V3, so as not to learn, this will give a good result.
- The layers that have been trained will be good, and if you “defrost” some of the upper layers, they will not be too indignant, they will only be tuned exactly as you want.
So, to summarize, when you want to train a mixture of "already learned" layers with new layers, you update new ones and then train around to fine-tune them.
source share