Add batch normalization immediately before or after non-linearity in Keras?

Question

Add batch normalization immediately before or after non-linearity in Keras?

def conv2d_bn(x, nb_filter, nb_row, nb_col, border_mode='same', subsample=(1, 1), name=None): '''Utility function to apply conv + BN. ''' x = Convolution2D(nb_filter, nb_row, nb_col, subsample=subsample, activation='relu', border_mode=border_mode, name=conv_name)(x) x = BatchNormalization(axis=bn_axis, name=bn_name)(x) return x

When I use the official inception_v3 model in keras, I find that they use BatchNormalization after the 'relu' non-linearity, as the above script code.

But in the document "Regular Party", the authors stated that

we add the BN transformation immediately before the nonlinearity, normalizing x = Wu + b.

Then I look at the implementation of the beginning in a tensor flow that adds BN just before non-linearity, as they said. For more information, start ops.py

I'm confused. Why do people use the above style in Keras besides the following?

 def conv2d_bn(x, nb_filter, nb_row, nb_col, border_mode='same', subsample=(1, 1), name=None): '''Utility function to apply conv + BN. ''' x = Convolution2D(nb_filter, nb_row, nb_col, subsample=subsample, border_mode=border_mode, name=conv_name)(x) x = BatchNormalization(axis=bn_axis, name=bn_name)(x) x = Activation('relu')(x) return x

In the dense case:

 x = Dense(1024, name='fc')(x) x = BatchNormalization(axis=bn_axis, name=bn_name)(x) x = Activation('relu')(x)

+5

theano deep-learning tensorflow keras

daoliker Feb 21 '17 at 6:06

source share

1 answer

gdelab · Answer 1 · 2017-03-22T11:06:26+0000

I also use it before activation, and this is really how it was developed, as well as other libraries such as lasagne batch_norm http://lasagne.readthedocs.io/en/latest/modules/layers/normalization.html#lasagne. layers.batch_norm .

However, it seems that in practice, its placement after activation works a little better:

https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md (this is just one test)

Add batch normalization immediately before or after non-linearity in Keras?

More articles: