Coffee: Softmax with temperature

I am working on the implementation of the Hinton distillation of knowledge. The first step is to save the soft targets of the “bulky model” with a higher temperature (ie I do not need to train the network, I just need to make a front pass to the image and keep the soft targets with temperature T).
Is there a way to get the result of the work of the socket Alexnet or googletet, but with a different temperature?
I need to change soft-max with pi= exp(zi/T)/sum(exp(zi/T).
It is necessary to separate the outputs of the final fully connected layer with temperature T. I need this only for the striker (not for training).

+4
source share
1 answer

I believe that there are three options for solving this problem.

1. Add your own layer Softmaxwith a temperature parameter. It should be pretty simple to change the code softmax_layer.cppto take into account the "temperature" T. You may need to configure caffe.prototo enable parsing of a layer Softmaxwith an additional parameter.

2. Deploy the layer as a python level .

3. If you need only a direct passage, that is, “extraction of functions”, you can simply output the “top” layer before the softmax layer as functions and make softmax with the temperature outside the coffee in general.

4. Scale Softmax:

layer {
  type: "Scale"
  name: "temperature"
  bottom: "zi"
  top: "zi/T"
  scale_param { 
    filler: { type: 'constant' value: 1/T }  # replace "1/T" with the actual value of 1/T.
  }
  param { lr_mult: 0 decay_mult: 0 } # make sure temperature is fixed
}
layer {
  type: "Softmax"
  name: "prob"
  bottom: "zi/T"
  top: "pi"
}
+2

Source: https://habr.com/ru/post/1615160/


All Articles