Training of quantized models in TensorFlow

I would like to train a quantized network, i.e. use quantized weights during the direct pass to calculate the loss, and then update the full-precision floating point base weights during the back pass.

This question has already been asked here , but has not been answered.

Note that in my case, "fake quantization" is sufficient. This means that the balance can still be stored as 32-bit floating point values ​​if they are a quantized value with a low bit width.

In a blog post by Pete Warden, he claims:

"[...] we have support for" fake quantization "operators. If you include them in your graphs at the points where quantization is expected (for example, after convolution), then in a direct float pass the values ​​will be rounded to the specified number of levels (usually 256) to simulate the effects of quantization.

The specified operators can be found in the TensorFlow API .

Can someone tell me how to use these features? If I call them, for example, the conv layer in my model definition, why would this quantize the weights in the layer instead of the outputs (activation) of this layer?

+4
source share

Source: https://habr.com/ru/post/1688776/


All Articles