Decoding letters ('a' .. 'z') from a sequence of bits without waste

I am looking for an algorithm that will allow me to represent the incoming sequence of bits in the form of letters ('a' .. 'z') in minimal matter, so that the bit stream can be regenerated from letters, the entire sequence in memory.

That is, given the external source of the bit (each reading returns an almost random bit) and user input of several bits, I would like to print the minimum number of characters that these bits can represent.

Ideally, there should be parameterization — how much memory and maximum bits before some waste is needed.

The goal of efficiency is the same number of characters as the representation of bit-26 in base 26.

Solutions:

  • If enough storage was present, save the entire sequence and use the operation with a large number of MOD 26.

  • Convert every 9 bits to 2 characters. This seems suboptimal, losing 25% of the information about the output of letters.

+3
source share
6 answers

If you assign a different number of bits per letter, you should be able to accurately encode twenty-six letter bits without using any bits. (This is very similar to Huffman code, only with a pre-built balanced tree.)

To encode bits into letters: accumulate bits until you exactly match one of the bit codes in the lookup table. Print this letter, clear the bit buffer and continue moving.

: .

. ( , .)

a 0000
b 0001
c 0010
d 0011
e 0100
f 0101
g 01100
h 01101
i 01110
j 01111
k 10000
l 10001
m 10010
n 10011
o 10100
p 10101
q 10110
r 10111
s 11000
t 11001
u 11010
v 11011
w 11100
x 11101
y 11110
z 11111
+8

47 26 10 . 99,99% .

, , , . , .

1. , 47. .

"" 47- 2. 1 , .

+6

, ? , .

+3

log_2 (26) . , 4.7, 47 10 . , 4.67, 14 3 . , . , 17 576 , 14 3 . mod div .

number of letters    number of bits    bits/letter
 1                    4                4
 2                    9                4.5
 3                   14                4.67
 4                   18                4.5
 5                   23                4.6
 6                   28                4.67
 7                   32                4.57
 8                   37                4.63
 9                   42                4.67
10                   47                4.7
+3

, 26 2. , , " " 9 . 512 .

+1

, , . 4,5 / char. 26 ( ..), 4.7 , , (, Huffman. . Jaegers) .

, , . , 32- 6 chachachter ( 26 ^ 6 < 2 ^ 32), 5.33 / char. 13 64- (4.92 /char). , . ints, 64 , - .

, , LZW Deflate.

+1

Source: https://habr.com/ru/post/1697493/


All Articles