Compress floating point numbers with a given range and precision

In my application, I am going to use floating point values ​​to store geographical coordinates (latitude and longitude).

I know that the integer part of these values ​​will be in the range [-90, 90] and [-180, 180] respectively. In addition, I have a requirement to provide some fixed accuracy for these values ​​(currently 0.00001 , but may be changed later).

After examining the single-point type with a floating point ( float ), I see that it is a little small to contain my values. This is because 180 * 10^5 greater than 2^24 (the size of the float value), but less than 2^25 .

Therefore I need to use double. But the problem is that I am going to store a huge amount of these values, so I do not want to waste bytes, while maintaining unnecessary accuracy.

So, how can I do some kind of compression when converting my double value (with a fixed integer number of parts and the specified precision of X) to a byte array in java? For example, if I use the precision from my example ( 0.00001 ), I get 5 bytes for each value. I am looking for a lightweight algorithm or solution so that it does not involve huge calculations.

+4
source share
3 answers

To keep the number x with a fixed precision (e.g.) 0.00001 , just store the integer closest to 100000 * x . (By the way, this requires 26 bits, not 25, because you need to store negative numbers as well.)

+6
source

Like TonyK in his answer , use int to store numbers.

To compress numbers further, use locality: geo-coordinates are often “compressed” (say, the outline of a city block). Use a fixed anchor point (full resolution 2x26 bits), and then save the offsets to the last coordinate as byte (gives +/- 0.00127). Alternatively, use short , which gives you more than half the range of values.

Just remember to hide the compression / decompression in a class that offers only double as an external API, so you can adjust the accuracy and compression algorithm at any time.

+3
source

Given your use case, I would nonetheless use double and compress them directly.

The reason is that strong compressors such as 7zip are extremely good at processing "structured" data, which is a double array (one information = 8 bytes, this is very regular and predictable).

Any other optimization that may occur "manually" is likely to give in or offer a slight advantage, while assessing time and risks.

Please note that you can still use the “trick” of converting double to int before compression, but I'm really not sure if it will bring tangible benefits to you, but on the other hand it will seriously reduce your ability to deal with unexpected ranges of numbers in the future.

[Change] Depending on the source data, if the bit is “lower than the accuracy level”, “noisy”, it may be useful for the compression ratio to remove noisy bits by rounding the value or even directly applying a mask at the lowest bit level (I think purists will not like this last method, but at least you can directly choose your level of accuracy in this way, while preserving the entire range of possible values).

So, to summarize, I suggest direct LZMA compression in your double array.

+2
source

Source: https://habr.com/ru/post/1384661/


All Articles