Caffe reshape / upsample fully connected layer

Suppose we have a layer like this:

layer { name: "fully-connected" type: "InnerProduct" bottom: "bottom" top: "top" inner_product_param { num_output: 1 } } 

Conclusion: batch_size x 1. In several documents (to illustrate link1 page 3 above or link2 page 4 above) I saw that they used such a layer at the end to come up with a 2D image for pixel prediction. How can I convert this to a two-dimensional image? I was thinking about remaking or deconvolution, but I can't figure out how this will work. A simple example would be helpful

UPDATE: My input images are 304x228, and my ground_truth (depth images) is 75x55.

 ################# Main net ################## layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1" type: "Pooling" bottom: "norm1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2" type: "Pooling" bottom: "norm2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relufc6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4070 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { type: "Reshape" name: "reshape" bottom: "fc7" top: "fc7_reshaped" reshape_param { shape { dim: 1 dim: 1 dim: 55 dim: 74 } } } layer { name: "deconv1" type: "Deconvolution" bottom: "fc7_reshaped" top: "deconv1" convolution_param { num_output: 64 kernel_size: 5 pad: 2 stride: 1 #group: 256 weight_filler { type: "bilinear" } bias_term: false } } ######################### layer { name: "conv6" type: "Convolution" bottom: "data" top: "conv6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 63 kernel_size: 9 stride: 2 pad: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu6" type: "ReLU" bottom: "conv6" top: "conv6" } layer { name: "pool6" type: "Pooling" bottom: "conv6" top: "pool6" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } ######################## layer { name: "concat" type: "Concat" bottom: "deconv1" bottom: "pool6" top: "concat" concat_param { concat_dim: 1 } } layer { name: "conv7" type: "Convolution" bottom: "concat" top: "conv7" convolution_param { num_output: 64 kernel_size: 5 pad: 2 stride: 1 weight_filler { type: "gaussian" std: 0.011 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu7" type: "ReLU" bottom: "conv7" top: "conv7" relu_param{ negative_slope: 0.01 engine: CUDNN } } layer { name: "conv8" type: "Convolution" bottom: "conv7" top: "conv8" convolution_param { num_output: 64 kernel_size: 5 pad: 2 stride: 1 weight_filler { type: "gaussian" std: 0.011 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu8" type: "ReLU" bottom: "conv8" top: "conv8" relu_param{ negative_slope: 0.01 engine: CUDNN } } layer { name: "conv9" type: "Convolution" bottom: "conv8" top: "conv9" convolution_param { num_output: 1 kernel_size: 5 pad: 2 stride: 1 weight_filler { type: "gaussian" std: 0.011 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu9" type: "ReLU" bottom: "conv9" top: "result" relu_param{ negative_slope: 0.01 engine: CUDNN } } 

Journal:

 I1108 19:34:57.239722 4277 data_layer.cpp:41] output data size: 1,1,228,304 I1108 19:34:57.243340 4277 data_layer.cpp:41] output data size: 1,1,55,74 I1108 19:34:57.247392 4277 net.cpp:150] Setting up conv1 I1108 19:34:57.247407 4277 net.cpp:157] Top shape: 1 96 55 74 (390720) I1108 19:34:57.248191 4277 net.cpp:150] Setting up pool1 I1108 19:34:57.248196 4277 net.cpp:157] Top shape: 1 96 27 37 (95904) I1108 19:34:57.253263 4277 net.cpp:150] Setting up conv2 I1108 19:34:57.253276 4277 net.cpp:157] Top shape: 1 256 27 37 (255744) I1108 19:34:57.254202 4277 net.cpp:150] Setting up pool2 I1108 19:34:57.254220 4277 net.cpp:157] Top shape: 1 256 13 18 (59904) I1108 19:34:57.269943 4277 net.cpp:150] Setting up conv3 I1108 19:34:57.269961 4277 net.cpp:157] Top shape: 1 384 13 18 (89856) I1108 19:34:57.285303 4277 net.cpp:150] Setting up conv4 I1108 19:34:57.285338 4277 net.cpp:157] Top shape: 1 384 13 18 (89856) I1108 19:34:57.294801 4277 net.cpp:150] Setting up conv5 I1108 19:34:57.294841 4277 net.cpp:157] Top shape: 1 256 13 18 (59904) I1108 19:34:57.295207 4277 net.cpp:150] Setting up pool5 I1108 19:34:57.295210 4277 net.cpp:157] Top shape: 1 256 6 9 (13824) I1108 19:34:57.743222 4277 net.cpp:150] Setting up fc6 I1108 19:34:57.743259 4277 net.cpp:157] Top shape: 1 4096 (4096) I1108 19:34:57.881680 4277 net.cpp:150] Setting up fc7 I1108 19:34:57.881718 4277 net.cpp:157] Top shape: 1 4070 (4070) I1108 19:34:57.881826 4277 net.cpp:150] Setting up reshape I1108 19:34:57.881846 4277 net.cpp:157] Top shape: 1 1 55 74 (4070) I1108 19:34:57.884768 4277 net.cpp:150] Setting up conv6 I1108 19:34:57.885309 4277 net.cpp:150] Setting up pool6 I1108 19:34:57.885327 4277 net.cpp:157] Top shape: 1 63 55 74 (256410) I1108 19:34:57.885395 4277 net.cpp:150] Setting up concat I1108 19:34:57.885412 4277 net.cpp:157] Top shape: 1 64 55 74 (260480) I1108 19:34:57.886759 4277 net.cpp:150] Setting up conv7 I1108 19:34:57.886786 4277 net.cpp:157] Top shape: 1 64 55 74 (260480) I1108 19:34:57.897269 4277 net.cpp:150] Setting up conv8 I1108 19:34:57.897303 4277 net.cpp:157] Top shape: 1 64 55 74 (260480) I1108 19:34:57.899129 4277 net.cpp:150] Setting up conv9 I1108 19:34:57.899138 4277 net.cpp:157] Top shape: 1 1 55 74 (4070) 
+1
source share
2 answers

The num_output value of the last fully connected layer will not be 1 for pixel prediction. It will be equal to w*h input image.

What made you feel that the value would be 1?

Change 1 :

Below are the sizes of each layer mentioned in link 1, page 3:

 LAYER OUTPUT DIM [c*h*w] course1 96*h1*w1 conv layer course2 256*h2*w2 conv layer course3 384*h3*w3 conv layer course4 384*h4*w4 conv layer course5 256*h5*w5 conv layer course6 4096*1*1 fc layer course7 X*1*1 fc layer where 'X' could be interpreted as w*h 

To understand this further, suppose we have a network to predict image pixels. Images have a size of 10 * 10. Thus, the final output of the fc layer will also have a dimension of 100 * 1 * 1 (as in course 7). This can be interpreted as 10 * 10.

Now the question will be how a 1d array can correctly predict a 2d image. For this, you should note that losses are calculated for this output using labels that may correspond to pixel data. Thus, during training, the scales will learn to predict pixel data.

EDIT 2:

Trying to make a grid using draw_net.py in caffe gives you the following: enter image description here

The relu level associated with conv6 and fc6 has the same name, which leads to complex connectivity in the inverse image. I'm not sure if this will lead to some problems during training, but I would suggest that you rename one of the relu layers to a unique name to avoid some unforeseen problems.

Returning to your question, it seems that after fully connected layers there is no spike. As you can see from the magazine:

 I1108 19:34:57.881680 4277 net.cpp:150] Setting up fc7 I1108 19:34:57.881718 4277 net.cpp:157] Top shape: 1 4070 (4070) I1108 19:34:57.881826 4277 net.cpp:150] Setting up reshape I1108 19:34:57.881846 4277 net.cpp:157] Top shape: 1 1 55 74 (4070) I1108 19:34:57.884768 4277 net.cpp:150] Setting up conv6 I1108 19:34:57.885309 4277 net.cpp:150] Setting up pool6 I1108 19:34:57.885327 4277 net.cpp:157] Top shape: 1 63 55 74 (256410) 

fc7 has an output size of 4070 * 1 * 1. It converts to 1 * 55 * 74 for transmission as a conv6 layer.

The output of the entire network is made in conv9 , which has an output size of 1*55*74 , which is exactly similar to the size of the labels (depth data).

Please indicate exactly where you feel what upsample is going on if my answer is still not clear.

+2
source

if you just need fully connected networks, such as a regular multilayer perceptron, use 2D drops ( shape (N, D) ) and call InnerProductLayer .

-1
source

Source: https://habr.com/ru/post/1244411/


All Articles