Transfer files using checksums only?

Is it possible to transfer large files using only the checksum system and then restore the original file using calculations?

Say you pass the MD5 checksum of the file and the file size. Having made a “virtual file” and calculated its checksum, having tried each bit combination, you should eventually “reach” the source file. But along the way, you will also get a lot of “collisions”, where the checksum will also match.

So, we change the first byte of the source file to some given value, again calculate the checksum and send it too. If we do the same lookup in a virtual file, we can check each “collision” to make sure it still matches. That should narrow it down a bit, and we can do it a few times.

Of course, the computing power for this would be enormous. But theoretically this is possible, and how many checksums would you need to transmit something (say 1mb)? Or maybe the amount of data needed to transfer checksums, the size of a file, which makes it pointless?

+4
source share
5 answers

The amount of data needed for transfer will probably be the same size as the file. Consider: if you could transfer an n byte file with n-1 bytes of data, this means that you have 256^(n-1) possible data patterns that you could send, but choose from a space of size 256^n . This means that one out of every 256 files will not be expressed using this method - this is often called the pidegonhole principle .

Now, even if this is not a problem, there is no guarantee that you will not have a collision after any number of checksums. Checksum algorithms are designed to prevent collisions, but for most checksum / hash algorithms there is no strong evidence that after X hashes you cannot guarantee any collisions in space with N bytes.

Finally, hashing algorithms are at least designed to be hard to reverse, so even if that were possible, it would require a huge amount of CPU power.

However, for such an approach, you might be interested to read about Error Correction Codes in the forward direction - they are not hash algorithms at all, but I think you may find them interesting.

+2
source

You have an information problem here. The checksum is not necessarily unique to a particular data set, in fact, in order to effectively have many bits of information as a source. What this may indicate is that the data obtained is not the exact data from which the checksum was obtained, but in most cases it cannot prove it.

0
source

Short answer: not in any meaningful way.

Long answer:

Assume an arbitrary file file.bin with a size of 1000 bytes. There are 2^(8*1000) different combinations that may be its actual contents. By sending, for example, a 1000-bit checksum, you still have about 2^(7*1000) alternatives.

By sending one extra bit, you can reduce them to half ... and you still have 2^6999 conflicts. By the time collisions are resolved, you will send at least 8000 bits, i.e. An amount equal to or larger than the file size.

The only way for this to be theoretically possible (Note: I did not say “doable”, not to mention “practical”) would be if the file really did not contain random data, and you could use this knowledge to trim the alternatives. In this case, you would be better off using compression, ayway. Content compression algorithms (e.g., FLAC for audio) use a priori knowledge of input data properties to improve compression.

0
source

In short, no.

To take a hypothetical example, consider a 24-bit photo with 6 pixels - there are 2 ^ (24 * 6) (2 ^ 144) possible combinations of intensities for each color channel at these six pixels, so you can gaurantee what if you evaluate each possibility, you are guaranteed an MD5 collision (since MD5 is a 128-bit number).

0
source

I think that what you think is actually an interesting topic, but you have not pressed on the correct method. If I can try to rephrase your question, you ask if there is a way to apply the function to some data, pass the result of the function and then restore the original data from the result of the terser function. For one MD5 checksum, the answer is no, but with other functions, if you are willing to send multiple function results, this is possible. In general, this area of ​​research is called concise sensation . Accurate reconstruction is sometimes possible, but more often it is used as a lossy compression scheme for images and other visual or audio data.

0
source

Source: https://habr.com/ru/post/1336457/


All Articles