After a while, I think there is at least one reason that he can avoid mistakes translating errors.
Tensorflow / slim, as well as other high-level libraries, allow tensor formats to either BHWC(batch_size, height, width, channel. Same below) by default, or BCHW(for better performance).
When converting weights between the two formats, the weights of the [in_channel, out_channel]first fc (fully connected layer after the conv layer) must be changed to [last_conv_channel, height, width, out_channel], for example, then transferred to [height, width, last_conv_channel, out_channel]and changed back to [in_channel, out_channel].
And if you use conv scales rather than fully connected weights, such a conversion will be explicitly applied to the fc level (actually, conv-weight). Of course, this will avoid errors.
source
share