I tried to understand some basics about tensor flow and I got stuck while reading the documentation to maximize the union of the 2D layer: https://www.tensorflow.org/tutorials/layers#pooling_layer_1
This is how max_pooling2d is indicated:
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
where it conv1has a tensor with a form [batch_size, image_width, image_height, channels], specifically in this case it is [batch_size, 28, 28, 32].
So our input is a tensor with the form [batch_size, 28, 28, 32].
My understanding of the maximum level of a two-dimensional layer is that in this case, a size filter pool_size(in this case 2x2) and moving the sliding window to stride(also 2x2) will be applied . This means that, as widthwell as heightthe images will be half, i.e. we obtain a 14x14 pixel per channel (of 32 channels), which means that our output is a tensor to form: [batch_size, 14, 14, 32].
However, according to the link above, the shape of the output tensor [batch_size, 14, 14, 1]:
Our output tensor produced by max_pooling2d() (pool1) has a shape of
[batch_size, 14, 14, 1]: the 2x2 filter reduces width and height by 50%.
What am I missing here?
How was 32 to 1 converted?
Now they apply the same logic: https://www.tensorflow.org/tutorials/layers#convolutional_layer_2_and_pooling_layer_2
but this time it is correct, i.e. [batch_size, 14, 14, 64]becomes [batch_size, 7, 7, 64](the number of channels is the same).