According to the original fallout document, this regularization method can be applied to convolution levels that often improve their performance. The TensorFlow function tf.nn.dropoutsupports this with a parameter noise_shapethat allows the user to choose which parts of the tensors will fall out independently. However, neither paper nor documentation provides a clear explanation of which measurements should be stored independently, and TensorFlow's explanation of how it works noise_shapeis rather unclear.
only sizes with noise_shape [i] == shape (x) [i] will make independent decisions.
I would suggest that for typical output of the CNN level of a form, [batch_size, height, width, channels]we do not want individual rows or columns to fall out by themselves, but rather entire channels (which would be equivalent to a node in a fully connected NN) regardless of the examples (i.e. different channels can be dropped for various examples in the batch). Am I correct in this assumption?
If so, how could an exception with such specificity be implemented using the parameter noise_shape? This will:
noise_shape=[batch_size, 1, 1, channels]
or
noise_shape=[1, height, width, 1]
source
share