I am using the CNTK Fast R-CNN implementation (released on github).
Selective search did not give me good offers by region, so I wrote something that works best for my data (I deal with scanned documents). My task is to identify WaterMarks in documents and tie a dense box around them. Extending the CNTK Object Detection Tutorial to identify horizontally aligned WaterMarks was pretty straightforward, giving me decent accuracy. Although the network uses AlexNet cov scales (transmission training), it seems to pretty well generalize images containing text. Now I am faced with the problem of identifying rotating WaterMarks (rotated to some extent).
I have a few questions on this issue:
Conclusion "outside the box" regression output 4 numbers -> (topX, topY, width, height);
However, this representation does not allow the rotation of rectangles. I understand that when creating my boxes of truth I must draw rotating rectangles, as well as have sentences with rotating regions. How can I change the network architecture to predict such boxes? 5 numbers -> (topX, topY, width, height, angle): similar to function cv2.minAreaRect()? 8 numbers -> (x1, y1, x2, y2, x3, y3, x4, y4)?
I apologize if this is a trivial problem, but I have problems with wrapping my head.
Does the algorithm help the object rotate? Am I making it harder than it should be? I read about others purposefully applying image zoom (zooming and rotation) to have a more robust model. When such an increase is performed, is it a model capable of recognizing and snapping tightly rotated rectangles / squares around an object of interest?
source
share