What is the best integer compression algorithm?

I want a better compression algorithm for a list of random numbers.

List example:

224.19 225.57 226.09 222.74 222.20 222.11 223.14 540.56 538.96 540.14 540.44 336.45 338.47 340.78 156.73 160.02 158.56 156.23 55.08 56.33 54.88 53.45 

I can skip the fractional part. I have a huge list of numbers, like the example above, so it needs to be compressed.

Can you recommend something?

+4
source share
2 answers

As noted in the comments, your numbers are far from random.

I would delete the decimal point first, as it seems that all of your numbers can be described with two digits after the decimal point. So just multiply all numbers by 100 when compressing and divide by 100 when unpacking.

Secondly, I would have a delta number code by subtracting the last number from each. The first number does not change. The reconstruction is obvious. So you get:

 22419, 138, 52, -335, -54, -9, 103, 31742, -160, 118, 30, -20399, 202, 231, -18405, 329, -146, -233, -10115, 125, -145, -143 

for coding. Now we get somewhere. We usually have small deltas, with a rare long jump. Then use variable-length integers to encode them. Then the delta histogram would be useful for building this code well. A simple example would be 7 bits per byte, with the most significant bit indicating the end of an integer. A more complex bit-level scheme may be more optimal, depending on the probability distribution.

+3
source

Do not use floats, use integers with some kind of control character to represent the decimal point if you need it, but if you can skip it, so much the better.

See Variable Byte Encodings . The advantage is that you do not need to allocate 64 bits of memory for small integers.

If your numbers have some dependence on each other, you can look into Delta encoding - it stores the difference between the two numbers, not the numbers themselves.

Variable byte encodings and delta encoding are used as the main methods for compressing inverted list indexes by Google and any other search engine company.

+5
source

Source: https://habr.com/ru/post/1487260/


All Articles