Maybe you should use Python to perform some or all of the comparison results?
One improvement would only be to run cmp if the file sizes are the same; if they are different, it is clear that the file has changed. Instead of running cmp you might consider generating a hash for each file using MD5 or SHA1 or SHA-256, or something that interests you (using Python modules or extensions, if that is the correct term). If you do not think that you will be dealing with malicious intentions, then MD5 is probably sufficient to detect differences.
Even in a shell script, you can run an external hash command and give it the names of all files in one directory, and then give them the names of all files in another directory. You can then read the two sets of hash values ββplus the file names and decide which ones have changed.
Yes, it looks like he's taking too long. But the problem is to run 1000 copies of cmp , as well as other processing. Both the Python suggestions above and the shell script have a general idea that they do not run the program 1000 times; they try to minimize the number of running programs. This reduction in the number of completed processes will give you a pretty big boost for you, I expect.
If you can save the hashes from the "current file set" and simply create new hashes for the new file set and then compare them, you will succeed. Obviously, if a file containing "old hashes" (the current set of files) is missing, you will have to regenerate it from existing files. This makes the information in the comments a bit more specific.
Another possibility: you can track changes in the data that you use to create these files, and use this to tell you which files will be changed (or at least limit the set of files that can be changed, and therefore , you need to compare, as your comments show that most files are the same every time).
source share