The most efficient lossless compression for random numerical data?

My data is not really random. I want to compress telemetry measurements that will tend to be in the same range (for example, the temperature will not differ much). However, I am looking for a solution for several applications, so I could send temperatures one day, voltage to the next, etc.

I want to send measurements on a satellite line with a low data rate. SatCom is quite expensive, so I would like to shave every cent that I can. I am not opposed to consuming computing resources to pack and unpack data, since nothing is too critical for time (it may take up to 30 seconds to transfer 192 bytes).

Can anyone advise a FOSS data compression method that will give me the highest degree of telemetry data compression?

Is it worth trying? What percentage increase can I expect?

I apologize for not being able to be more precise about the nature of the data - just general telemetry measurements such as temperature, GPS longitude and longitude, fluid flow rate, etc.

+4
source share
2 answers

Truly random data is not compressed.

Since you cannot disclose the details of your data, it is best for you to test several different compression algorithms for some sample data.

A good place to start is the DEFLATE algorithm, which is a standard combination of LZ77 sliding window compression and Huffman coding. It is implemented by many specific compression packages, GZIP , which is a prime example.

+5
source

Perhaps the best solution would be to use the DEFLATE library and run it on large data blocks and with high compression settings.

If you want to collapse your own stream compression algorithm, you can apply the same algorithm that works for audio files: first send the first dimension, then encode the difference between each sample and the previous one (delta coding).

Now the best encoding is different in how fast the data changes:

If the data changes quickly, use the adaptive Huffman tree. If the differences are not correlated (data + noise), this will allow you to get no more than one bit per sample from entropy.

If several consecutive data samples can be equal to each other (data does not change very quickly and there is no HF noise), then encode each nonzero difference using one Huffman tree and the number of zeros using the second Huffman tree. This will allow you to get a maximum of two bits in one pass.

You can even encode differences in just one bit (up or down), but then you should be able to encode null runs.


My suggestion is: delta-encode once or twice to get uncorrelated records, then DEFLATE using the library.

+4
source

Source: https://habr.com/ru/post/1446120/


All Articles