Detecting if an object from one image is on another image using OpenCV

I have an image of a sample that contains an object, for example, earrings in the following image:

http://imgur.com/luj2Z

Then I have a large set of candidates for which I need to determine which one most likely contains an object, for example:

http://imgur.com/yBWgc

So, I need to create a rating for each image, where the highest result corresponds to the image that most likely contains the target. Now in this case, I have the following conditions / restrictions for working with / around:

1) I can get several sample images from different angles.

2) Sample images are likely to have different resolutions, angles, and distances than candidate images.

3) There are many candidate images (> 10000), so they should be fast enough.

4) I’m ready to sacrifice some accuracy for speed, so if this means that we have to search through the top 100, and not just the top 10, this is great and can be done manually.

5) I can manipulate image samples manually, for example, select the object that I want to detect; Candidate images cannot be manipulated manually, as there are too many of them.

6) I do not have a real background in OpenCV or computer vision at all, so I start from scratch here.

My initial thought is to start by drawing a rough outline around the object in the sample. Then I could identify the corners in the object and the corners in the image of the candidate. I could profile the pixels around each corner to see if they are similar, and then rank by the sum of the maximum similarity ratings of each corner. I'm also not sure how to quantify similar pixels. I think only Euclidean distance from their RGB values?

The problem is that it ignores the center of the object. In the above examples, if the corners of the earrings are next to the gold frame, then he will not look at the red, green and blue stones inside the earring. I suppose I could improve this by looking at all the pairs of angles and determining the similarity, selecting some points along the line between them.

So, I have a few questions:

A) Is this line of thinking generally understandable or is there something that I am missing?

B) What specific OpenCV algorithms should be investigated with? I know that there are several corner detection algorithms, but I only need one, and if the differences are all optimized in the fields, then I'm fine with the fastest.

C) Any sample code using algorithms that would be useful to my understanding?

My options for languages ​​are Python or C #.

+6
source share
4 answers

Check out the SURF features that are part of openCV. The idea here is that you have an algorithm for finding “points of interest” in two images. You also have an algorithm for calculating the image patch descriptor around each point of interest. Typically, this handle captures the distribution of edge orientations in the patch. Then you try to find point matches, i.e. e. for each point of interest in the image. Try to find the appropriate point of interest in image B. This is achieved by comparing the descriptors and finding the closest matches. Then, if you have a set of correspondences associated with some kind of geometric transformation, you have a discovery.

Of course, this is a very high level explanation. The devil is in the details, and for those whom you should read some documents. Start with Distinctive Image Features from David Lowe's Non-Invariant Key Points , and then read SURF articles.

Also, consider migrating this issue to File Sharing and Image Processing .

+4
source

Luckily, the guys at OpenCV just did it for you. Check the sample folder "opencv \ samples \ cpp \ matching_to_many_images.cpp". Compile and let it try the default images.

The algorithm can be easily adapted for acceleration or refinement.

Basically, object recognition algorithms are divided into two parts: key point detection & description adn object matching. For them there are many algorithms / options with which you can directly play in OpenCV.

Detection / description can be performed by: SIFT / SURF / ORB / GFTT / STAR / FAST and others.

For comparison, you have: brute force, hamming, etc. (Some methods are specific to this detection algorithm)

TIPS:

  • crop the original image so that the interesting object covers as much of the image area as possible. Use it as a training.

  • SIFT is the most accurate and lazy descriptor. FAST is a good combination of accuracy and precision. GFTT is old and rather unreliable. ORB has recently been added to OPENCV and is very promising in both speed and accuracy.

  • The results depend on the pose of the object in another image. If it is changed, rotated, compressed, partially closed, etc., Try SIFT. if this is a simple task (i.e. it appears with almost the same size / rotation / etc., most of the descriptors will do well).
  • ORB may not yet be in an OpenCV release. Try downloading the latest version from the openCV trunk and compiling it https://code.ros.org/svn/opencv/trunk

Thus, you can find the best combination for you by trial and error.

For detailed information about each implementation, you should read the original documents / tutorials. google Scientist is a good start

+5
source

In case someone appears in the future, here is a small example that does this with openCV. It is based on opencv sample , but (in my opinion) it is a little clearer, so I also included it.

Tested with openCV 2.4.4

#!/usr/bin/env python ''' Uses SURF to match two images. Finds common features between two images and draws them Based on the sample code from opencv: samples/python2/find_obj.py USAGE find_obj.py <image1> <image2> ''' import sys import numpy import cv2 ############################################################################### # Image Matching ############################################################################### def match_images(img1, img2, img1_features=None, img2_features=None): """Given two images, returns the matches""" detector = cv2.SURF(3200) matcher = cv2.BFMatcher(cv2.NORM_L2) if img1_features is None: kp1, desc1 = detector.detectAndCompute(img1, None) else: kp1, desc1 = img1_features if img2_features is None: kp2, desc2 = detector.detectAndCompute(img2, None) else: kp2, desc2 = img2_features #print 'img1 - %d features, img2 - %d features' % (len(kp1), len(kp2)) raw_matches = matcher.knnMatch(desc1, trainDescriptors=desc2, k=2) kp_pairs = filter_matches(kp1, kp2, raw_matches) return kp_pairs def filter_matches(kp1, kp2, matches, ratio=0.75): """Filters features that are common to both images""" mkp1, mkp2 = [], [] for m in matches: if len(m) == 2 and m[0].distance < m[1].distance * ratio: m = m[0] mkp1.append(kp1[m.queryIdx]) mkp2.append(kp2[m.trainIdx]) kp_pairs = zip(mkp1, mkp2) return kp_pairs ############################################################################### # Match Diplaying ############################################################################### def draw_matches(window_name, kp_pairs, img1, img2): """Draws the matches""" mkp1, mkp2 = zip(*kp_pairs) H = None status = None if len(kp_pairs) >= 4: p1 = numpy.float32([kp.pt for kp in mkp1]) p2 = numpy.float32([kp.pt for kp in mkp2]) H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0) if len(kp_pairs): explore_match(window_name, img1, img2, kp_pairs, status, H) def explore_match(win, img1, img2, kp_pairs, status=None, H=None): """Draws lines between the matched features""" h1, w1 = img1.shape[:2] h2, w2 = img2.shape[:2] vis = numpy.zeros((max(h1, h2), w1 + w2), numpy.uint8) vis[:h1, :w1] = img1 vis[:h2, w1:w1 + w2] = img2 vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR) if H is not None: corners = numpy.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]]) reshaped = cv2.perspectiveTransform(corners.reshape(1, -1, 2), H) reshaped = reshaped.reshape(-1, 2) corners = numpy.int32(reshaped + (w1, 0)) cv2.polylines(vis, [corners], True, (255, 255, 255)) if status is None: status = numpy.ones(len(kp_pairs), numpy.bool_) p1 = numpy.int32([kpp[0].pt for kpp in kp_pairs]) p2 = numpy.int32([kpp[1].pt for kpp in kp_pairs]) + (w1, 0) green = (0, 255, 0) red = (0, 0, 255) for (x1, y1), (x2, y2), inlier in zip(p1, p2, status): if inlier: col = green cv2.circle(vis, (x1, y1), 2, col, -1) cv2.circle(vis, (x2, y2), 2, col, -1) else: col = red r = 2 thickness = 3 cv2.line(vis, (x1 - r, y1 - r), (x1 + r, y1 + r), col, thickness) cv2.line(vis, (x1 - r, y1 + r), (x1 + r, y1 - r), col, thickness) cv2.line(vis, (x2 - r, y2 - r), (x2 + r, y2 + r), col, thickness) cv2.line(vis, (x2 - r, y2 + r), (x2 + r, y2 - r), col, thickness) vis0 = vis.copy() for (x1, y1), (x2, y2), inlier in zip(p1, p2, status): if inlier: cv2.line(vis, (x1, y1), (x2, y2), green) cv2.imshow(win, vis) ############################################################################### # Test Main ############################################################################### if __name__ == '__main__': if len(sys.argv) < 3: print "No filenames specified" print "USAGE: find_obj.py <image1> <image2>" sys.exit(1) fn1 = sys.argv[1] fn2 = sys.argv[2] img1 = cv2.imread(fn1, 0) img2 = cv2.imread(fn2, 0) if img1 is None: print 'Failed to load fn1:', fn1 sys.exit(1) if img2 is None: print 'Failed to load fn2:', fn2 sys.exit(1) kp_pairs = match_images(img1, img2) if kp_pairs: draw_matches('find_obj', kp_pairs, img1, img2) else: print "No matches found" cv2.waitKey() cv2.destroyAllWindows() 
+2
source

As said, algorithms such as SIFT and SURF contain a feature that is invariant to a number of distortions and descriptors, which are aimed at firmly simulating an object point in its environment.

The latter is increasingly used to categorize and identify images in what is commonly referred to as the “word bag” or “visual words” approach.

In the simplest form, you can collect all the data from all descriptors from all images and group them, for example, using k-means. Each source image has descriptors that contribute to a number of clusters. The centroids of these clusters, that is, visual words, can be used as a new descriptor for the image. They can then be used in an inverted file design architecture.

This approach allows for a soft comparison for generalizing a certain amount, for example, for obtaining all images with airplanes.

  • The VLfeat website contains, next to the excellent SIFT library, a good demonstration of this approach, classifying the caltech 101 dataset.

  • Caltech itself offers Matlab / C ++ software along with related publications.

  • Also a good start is LEAR

0
source

Source: https://habr.com/ru/post/899990/


All Articles