Similar images - how to compare them

I have more than 1.3 milliliter images that I have to compare with each other, and a few hundred are added per day.

My company takes an image and creates a version that can be used by our suppliers.

Files are often very similar to each other, for example, two different companies can send us two different images: JPG and GIF, both with the McDonald logo, and months between submissions.

What happens is that at the end we create the same logo two different times, when we can just copy / paste the already created one or at least offer it as a possible starting point for artists.

I was looking for algorithms to create a fingerprint or something that would allow me to make a simple request when uploading a new image, time is relatively not a problem if it takes 1 second 150 days to create a fingerprint to create a fingerprint, but it will be very important for save that we could even get 3 or 4 servers to do this.

I am fluent in PHP, but if the algorithm is in pseudocode or even C, I can read it and try to translate it (if it does not use some C-specific libraries)

I am currently doing MD5 of all images in order to catch the ones that are exactly the same, this question arose when I thought of doing image resizing and running md5 on the resized image to catch them which were saved in a different format and resized, but then all the same, I would not have received enough good recognition.

If I hadn’t mentioned this, I would be pleased to simply offer possible “similar” images.

EDIT

Keep in mind that checking should be done several times per minute, so the best solution is to give me some values ​​for the image that I can store and use in the future to compare with the image that I am viewing without having to re-scan the entire server .

I read several pages that mention histograms or resize the image to a very small size, you can break down the possible tags and then convert them to shades of gray, make a hash of these files and use them for comparison. If I succeed, I will send the code / response here

+4
source share
4 answers

Try using file_get_contents and: http://www.php.net/manual/en/function.hash-file.php

If the hashes match, then you know that they are exactly the same.

EDIT: If possible, I think storing image hashes, and the image path in the database table can help you limit server load. It is much easier to run the hash algorithm on your source images and save the hash in the table ... Then, when new images are sent, you can hash the image, and then search the database table. If the hash already exists, drop it. You can use the hash as the index of the table, so once you find a match, you don't need to check the rest.

Another option is not to use the database ... But then you will always need to look for n. This is checking the hash of the incoming image, and then starting in memory n a time search with all the stored images.

EDIT # 2: Please view the solution here: Image Comparison - Fast Algorithm

+2
source

A similar question for yours already exists, check it if it works for you: Compare 2 images in PHP

+1
source

There is a PHP ImageMagick extension that you could use.

+1
source

To speed up the process, sort all files by size and compare the insides only if the two sizes are equal. Using hash comparisons is also the fastest way to compare internal data. Hope this helps.

0
source

Source: https://habr.com/ru/post/1257791/


All Articles