Computer Vision Form / Template Approach

I am currently facing a fairly common problem that is fairly easy to solve, but so far all my views have failed, so I am turning to you for help.

I think the problem is better explained by some illustrations. I have several templates like these two:

Pattern 1Pattern 3

I also have an image like (probably better, because the photo from which it arose was rather poorly lit):

Picture

(Note how the template was scaled to fit the image size)

The ultimate goal is a tool that determines whether the user is pointing the thumb up / down, as well as some angles between them. Therefore, I want to compare the patterns with the image and see which one is most similar to the image (or, more precisely, the angle that the hand shows). I know the direction the thumb is pointing in the template, so if I find a template that looks the same, I also have an angle.

I work with OpenCV (with Python Bindings) and have already tried cvMatchTemplate and MatchShapes, but so far it does not work reliably.

I can only guess why MatchTemplate failed, but I think that a smaller picture with a smaller white fits completely into the white area of ​​the image, thereby creating the best match rate, although it is obvious that they really do not look the same.

Are there any methods hidden in OpenCV that I have not yet found, or is there a well-known algorithm for those kinds of problems that I have to override?

Happy New Year.

+6
source share
3 answers

A few simple methods may work:

  • After binarization and segmentation, find the diameter of blob ferets (aka the farthest distance between points or the main axis).
  • Find the convex hull of a point set, fill it in and treat it like a connected area. Subtract the original image with your thumb. The difference will be in the area between the big and the fist, and the position of this area relative to the center of mass should indicate a turn.
  • Use the watershed algorithm at the distances of each point to the edge of the blob. This can help identify the connected thin area (thumb).
  • Set the largest circle (or the largest inscribed polygon) inside the drop. Expand this circle or polygon until part of its edge overlaps the background. Subtract this expanded shape from the original image; only the thumb remains.
  • If the arm size is consistent (or relatively consistent), you can also perform N morphological erosion operations until the thumb disappears, and then N will expand the operations to increase the fist to its original approximate size. Subtract this fist only from the blonde to get a thumb. Then, to determine the direction, the direction of the thumb somersault (diameter of the frets) and / or the center of mass relative to the center of mass of the cam is used.

The search methods for critical points (areas with a strong change of direction) are more complicated. In the simplest case, you can also use corner detectors, and then check the distance from one corner to another to determine where the fist meets the inner edge of the finger.

For more complex methods, consider articles on form decomposition by authors such as Kimia, Siddiqui, and Xiaofing Mi.

+6
source

MatchTemplate seems appropriate for the problem you are describing. How does he fail for you? If you actually mask the thumbnails / thumbs down / fingers between the characters as well as you show in your pattern, then you have already done the hardest part.

MatchTemplate does not include rotation and scaling in the search space, so you should generate more templates from your reference image for all the turns you want to detect, and you should scale your templates according to the total size of the thumbs up / thumbs down found.

[edit] The results array for MatchTemplate contains an integer value that indicates how well the pattern matching in the image is at that location. If you use CV_TM_SQDIFF, the lowest value in the result array is the best match location; if you use CV_TM_CCORR or CV_TM_CCOEFF, then this is the highest value. If your scaled and rotated template images have the same number of white pixels, you can compare the value that is best for all template images and the most suitable template image is the one you want to select.

There are many independent rotation / scaling detection functions that can help you, but normalizing your problem with MatchTemplate is a lot easier.

For more advanced materials, SIFT , Haar, performance-based classifiers or one of the others available in OpenCV

+1
source

I think you can get great results if you just figure out two points that have the longest path through white. The direction that the thumb points is just the direction of the line connecting the two points.

You can do this easily by using sample points on the white area and using Floyd-Warshall .

+1
source

Source: https://habr.com/ru/post/904533/


All Articles