What is the ROI level in fast rcnn?

Question

What is the ROI level in fast rcnn?

What happens mathematically when regional offers change depending on the activation functions of the final convolutional layer? In the next tutorial about finding an object with CNN, mention is made of fast RCNN. Here they mentioned the level of ROI and what was happening. But I don’t understand what happens mathematically when resizing offers in your region for the final activation of conv.layer in each cell.

+5

deep-learning computer-vision object-detection tensorflow face-detection

Shamane siriwardhana Apr 15 '17 at 18:58

source share

2 answers

The ROI level (area of interest) is presented in Fast R-CNN and is a special case of the spatial pyramid pool layer, which is introduced into the Combination of the spatial pyramid in the deep convolution network for visual recognition . The main function of the ROI layer is to change inputs with an arbitrary size to a fixed-length output due to size restrictions in all connected layers.

How does the ROI layer work:

In this image, an input image with an arbitrary size is fed into this layer, which has 3 different windows: 4x4 (blue), 2x2 (green), 1x1 (gray) to obtain outputs with a fixed size of 16 x F, 4 x F and 1 x F respectively, for F, the number of filters. Then these outputs are concatenated into a vector, which will be fed to a fully connected level.

+5

Nghia tran Apr 15 '17 at 23:11

source share

kmario23 · Accepted Answer · 2017-04-16T01:06:42+0000

Interest Grouping (RoI):

This is a type of union layer that performs maximum union at the inputs (here, convnet function maps) of uneven size and creates a small map of objects of a fixed size (say, 7x7). The choice of this fixed size is a hyperparameter of the network and is predetermined.

The main goal of such an association is to accelerate the time of training and testing, as well as to train the entire system from end-to-end (in a joint way).

Due to the use of this combining layer, the training and testing time is faster compared to the original (vanilla?) R-CNN architecture and, therefore, the name Fast R-CNN.

A simple example (from Area of Interests Explained by deepsense.io ):

What is the ROI level in fast rcnn?

More articles: