I have more than 1.3 milliliter images that I have to compare with each other, and a few hundred are added per day.
My company takes an image and creates a version that can be used by our suppliers.
Files are often very similar to each other, for example, two different companies can send us two different images: JPG and GIF, both with the McDonald logo, and months between submissions.
What happens is that at the end we create the same logo two different times, when we can just copy / paste the already created one or at least offer it as a possible starting point for artists.
I was looking for algorithms to create a fingerprint or something that would allow me to make a simple request when uploading a new image, time is relatively not a problem if it takes 1 second 150 days to create a fingerprint to create a fingerprint, but it will be very important for save that we could even get 3 or 4 servers to do this.
I am fluent in PHP, but if the algorithm is in pseudocode or even C, I can read it and try to translate it (if it does not use some C-specific libraries)
I am currently doing MD5 of all images in order to catch the ones that are exactly the same, this question arose when I thought of doing image resizing and running md5 on the resized image to catch them which were saved in a different format and resized, but then all the same, I would not have received enough good recognition.
If I hadn’t mentioned this, I would be pleased to simply offer possible “similar” images.
EDIT
Keep in mind that checking should be done several times per minute, so the best solution is to give me some values for the image that I can store and use in the future to compare with the image that I am viewing without having to re-scan the entire server .
I read several pages that mention histograms or resize the image to a very small size, you can break down the possible tags and then convert them to shades of gray, make a hash of these files and use them for comparison. If I succeed, I will send the code / response here