Issue of compression / encryption algorithms

Question

Issue of compression / encryption algorithms

My question here is about compression / encryption algorithms in general and for me it sounds like a complete noobie. Now I understand that “in general” “everything depends”, but suppose we are talking about algorithms that have a reference implementation / published specifications and are generally so standardized. To be more specific, I use .NET implementations of AES-256 and GZip / Deflate

So here. It can be assumed that, given the exact same input, both types of algorithms will produce exactly the same result.

For example, will the output of aes(gzip("hello"), key, initVector)) on .NET be identical to the output on Mac or Linux?

+4

algorithm implementation encryption compression

Anton Gogolev Nov 25 '11 at 10:36

source share

2 answers

AES is defined by the standard, so any appropriate implementation will really give the same result. GZip is a program, so it is possible that different versions of the program will produce different results. I would expect a later version to be able to reinstall the output from an earlier version, but the reverse may not be possible.

As others have said, if you are going to compress, then compress the plaintext, not the cyphertext from AES. Cyphertext will not compress well, as it is intended for random display.

+2

rossum Nov 25 '11 at 12:35

source share

Cyan · Accepted Answer · 2011-11-25T15:38:52+0000

AES is strictly defined, so given the same input, the same algorithm and the same key, you will get the same result.

The same cannot be said for zip.

The problem is not in the standard. There is a certain standard: Deflate IETF RFC 1950 stream, gzip IETF RFC 1952 stream, so everyone can create a compatible zip compressor / decoder starting with these definitions.

But zip belongs to a large family of LZ compressors, which by construction are neither bijective nor injective. This means that from one source there are many ways to describe the same input, which are valid, although different.

Example. Say my input is: ABCABCABC

Valid outputs can be:

9 literals
3 literals followed by one 6-byte copy starting at offset -3
3 literals followed by two copies of 3 bytes each, starting at offset -3
6 literals followed by one copy of 3 bytes long, starting at offset -6
and etc.

All these outputs are valid and describe (regenerate) the same input. Obviously, one of them is more efficient (compress more) than the others. But what is where the implementation may differ. Some will be stronger than others. For example, it is known that kzip and 7zip generate better (more compressed) zip files than gzip. Even gzip has many compression options that generate different compressed streams, starting from the same input.

Now, if you want to constantly get exactly the same binary output, you need more than "zip": you need to provide an accurate implementation of zip and an exact compression parameter. Then you will be convinced that you always generate the same binary file.

Issue of compression / encryption algorithms

More articles: