I am creating a system that should be able to find if blobs of bytes are updated. Instead of storing the entire block (they can be up to 5 MB), I think I should calculate the checksum, save this and calculate the same checksum a little later to see if the blog is updated.
The goal is to minimize the following (in that order):
- checksum size
- time to calculate
- chance of collisions (2 identical checksums occur even if the content has been modified).
It is acceptable for our system to have a collision of no more than 1/1 000 000. The problem is not security, but simply when updating / detecting errors, so rare collisions are in order. (That's why I tried to minimize).
In addition, we cannot change text frames ourselves.
Of course, md5 , crc or sha1 come to mind, and if I wanted a quick solution, I would go for it. However, more than a quick solution, I'm looking for what can be a comparison of different methods, as well as the pros and cons.
crc md5 sha1 checksum
Julien Genestoux Nov 20 '10 at 2:09 p.m. 2010-11-20 14:09
source share