Unexpected bit string decompression results

Why do I open irb and I run puts 'A'.unpack("B8")
I get 01000001 , but when I run puts 'A'.unpack("B4B4")
Am I only getting 0100 and not [0100,0001] ?

Is permission to unpack only full bytes? Nothing less?

+6

Justin Apr 24 '13 at 2:27

1 answer

Do some tests to understand the behavior:

 > 'A'.unpack('B8') => ["01000001"]

It returns the 8 most significant bits (MSB) char 'A'

 > 'A'.unpack('B4') => ["0100"]

It returns 4 MSB char 'A'

 > 'A'.unpack('B16') => ["01000001"]

It returns 16 MSB char 'A' , but since there is only 8, we get 8 MSB

 > 'AB'.unpack('B16') => ["0100000101000010"]

It returns 16 msb sequences of characters 'AB' (end 8 bits 01000010 matches 'B' )

 > 'AB'.unpack('B10') => ["0100000101"]

It returns 10 MSB sequences of characters 'AB' , i.e. 8 MSB 'A' and 2 MSB 'B'

 > 'ABC'.unpack('B*') => ["010000010100001001000011"]

It returns all MSB sequences of characters 'ABC' , (end of 8 bits 01000011 matches 'C' )

 > 'AB'.unpack('B8B8') => ["01000001", "01000010"]

It returns the following array:

first element is 8 msb char 'A'
second element is 8 msb char 'B'

_

 > 'AB'.unpack('B8B7') => ["01000001", "0100001"]

It returns the following array:

first element is 8 msb char 'A'
the second element is 7 MSB char 'B'

_

 > 'AB'.unpack('B4B8') => ["0100", "01000010"]

It returns the following array:

the first element is 4 MSB char 'A'
second element is 8 msb char 'B'

_

 > 'AB'.unpack('B16B8') => ["0100000101000010", ""]

It returns the following array:

the first element is a 16 msb character sequence 'AB'
the second element is empty because the characters have already been used

_

 > 'AB'.unpack('B*B8') => ["0100000101000010", ""]

It gives you the same result and consumes the entire string.

 > 'AB'.unpack('B9B8') => ["010000010", ""]

It returns the following array:

the first element is a 9 msb character sequence 'AB'
the second element is empty because the characters have already been used

As a conclusion,

the BN directive above a line will consume no more than the first characters ((N-1) / 8) + 1 lines. If there are still characters in the string and you have the second BM directive, you will use no more than the following ((M-1) / 8) + 1 characters in the string. And so on for all of the following directives. If you use the B* directive, it will consume all characters and return the sequence of the corresponding MSB.

For instance:

 'ABCDEFG'.unpack('B17B*B8')

He must return to us:

17 msb sequence ABC
all DEFG sequences DEFG
empty bit string

Check:

 > 'ABCDEFG'.unpack('B17B*B8') => ["01000001010000100", "01000100010001010100011001000111", ""]

And indeed 'A'.unpack('B4B4') returns an array ["0100", ""] , since the first directive consumes char A

+5

toch Apr 24 '13 at 13:03

Source: https://habr.com/ru/post/943495/

All Articles