High compression grid numbers

I have a square grid that contains numbers, and I need to compress it so that it can be easily transported over the network. For example, I need to be able to compress a 40x40 grid by less than 512 bytes, regardless of the numbers in the grid. This is my basic requirement.

Each grid cell contains a number from 0 to 7, so each cell can fit in 3 bits.

Does anyone know of a good algorithm that can achieve what I want?

+4
source share
3 answers

You can encode your information in different ways. You do not need to assign to all numbers from 0 to 7 a code with the same number of bits. You can assign based on the number of times in a sequence.

First read the entire sequence by counting the number of occurrences of each number. Based on this, you can assign a code to each number. If you assign a code that follows, for example, a Huffman code , the codes will be a prefix code, which means that there is no extra character to separate numbers.

There are certain options that you can enter into the algorithm based on the test results to fine-tune the compression ratio.

I used this method in a project (university), and it provides, in general, good results. At the very least, it should approximate your theoretical 3 bits per symbol and could be much better if the probability distribution helps.

+2
source

As others have stated, the problem discussed is not possible, since 600 bytes are required to represent all possible grids. 600 bytes - of 40 rows, 40 columns, 3 bits per cell and 8 bits per byte ( 40 * 40 * 3 / 8 ). As Kerrek SB explained in the comments, you add 8 cells in 3 bytes.

In your own comments, you mentioned that this state of the game is transmitted over the network. Assuming you have a mechanism to ensure reliable data transport, then if there is a reasonable limit to the number of cells that can change between updates, or if you are allowed to send your updates when a certain number of cells change, you can get a 512 byte representation. If you use 1 bit to represent whether a cell has changed, you will need 200 bytes. Then you have 312 bytes remaining to represent the new values โ€‹โ€‹of the changed cells. Thus, you can represent up to 312*8/3 = 832 changed cells.

Aside, this view can represent up to 1064 changed cells in less than 600 bytes.

+1
source

What you want to do is convert the burrowes-wheeler to your data and then compress it. In this case, length coding will suffice.

http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform

This is likely to surpass the Huffman in your case.

It is true that in some cases you will need more than 512 bytes. Therefore, in your protocol, just make an exception for "perverted" grids. But in general, you should be less than 512.

+1
source

Source: https://habr.com/ru/post/1433906/


All Articles