Encoding more than 256 characters with arithmetic coding

I try to encode signed values ​​from -256 ↔ 255 (i.e. 9-bit data represented by short) using an arithmetic encoder, however I found that existing arithmetic encoding implementations (such as dlib , rANS ) usually read the file in line form and process data as 8-bit.

The problem with this method is that this separation of the signed data (shown in 3 ) as a line destroys the base histogram (shown in 4 ). I believe that such splitting can also degrade compression ratios (but I could be wrong).

I tested my hypothesis by implementing Huffman encoding with 8-bit and 16-bit data and found that I was right, possibly because of Huffman's dependency on creating a tree using probabilities.

(EDITED). My question is how to encode / simulate characters (which cannot be contained in a regular 8-bit container), so that the resulting characters can be easily compressed using traditional arithmetic compressor implementations without affecting the compression ratios.

Signed Bar Graph:

Signed histogram

Divided Bar Graph:

Splitted histogram

+4
source share

Source: https://habr.com/ru/post/1693637/


All Articles