Unexpected bit string decompression results

Why do I open irb and I run puts 'A'.unpack("B8")
I get 01000001 , but when I run puts 'A'.unpack("B4B4")
Am I only getting 0100 and not [0100,0001] ?

Is permission to unpack only full bytes? Nothing less?

+6
source share
1 answer

Do some tests to understand the behavior:

 > 'A'.unpack('B8') => ["01000001"] 

It returns the 8 most significant bits (MSB) char 'A'

 > 'A'.unpack('B4') => ["0100"] 

It returns 4 MSB char 'A'

 > 'A'.unpack('B16') => ["01000001"] 

It returns 16 MSB char 'A' , but since there is only 8, we get 8 MSB

 > 'AB'.unpack('B16') => ["0100000101000010"] 

It returns 16 msb sequences of characters 'AB' (end 8 bits 01000010 matches 'B' )

 > 'AB'.unpack('B10') => ["0100000101"] 

It returns 10 MSB sequences of characters 'AB' , i.e. 8 MSB 'A' and 2 MSB 'B'

 > 'ABC'.unpack('B*') => ["010000010100001001000011"] 

It returns all MSB sequences of characters 'ABC' , (end of 8 bits 01000011 matches 'C' )

 > 'AB'.unpack('B8B8') => ["01000001", "01000010"] 

It returns the following array:

  • first element is 8 msb char 'A'
  • second element is 8 msb char 'B'

_

 > 'AB'.unpack('B8B7') => ["01000001", "0100001"] 

It returns the following array:

  • first element is 8 msb char 'A'
  • the second element is 7 MSB char 'B'

_

 > 'AB'.unpack('B4B8') => ["0100", "01000010"] 

It returns the following array:

  • the first element is 4 MSB char 'A'
  • second element is 8 msb char 'B'

_

 > 'AB'.unpack('B16B8') => ["0100000101000010", ""] 

It returns the following array:

  • the first element is a 16 msb character sequence 'AB'
  • the second element is empty because the characters have already been used

_

 > 'AB'.unpack('B*B8') => ["0100000101000010", ""] 

It gives you the same result and consumes the entire string.

 > 'AB'.unpack('B9B8') => ["010000010", ""] 

It returns the following array:

  • the first element is a 9 msb character sequence 'AB'
  • the second element is empty because the characters have already been used

As a conclusion,

the BN directive above a line will consume no more than the first characters ((N-1) / 8) + 1 lines. If there are still characters in the string and you have the second BM directive, you will use no more than the following ((M-1) / 8) + 1 characters in the string. And so on for all of the following directives. If you use the B* directive, it will consume all characters and return the sequence of the corresponding MSB.

For instance:

 'ABCDEFG'.unpack('B17B*B8') 

He must return to us:

  • 17 msb sequence ABC
  • all DEFG sequences DEFG
  • empty bit string

Check:

 > 'ABCDEFG'.unpack('B17B*B8') => ["01000001010000100", "01000100010001010100011001000111", ""] 

And indeed 'A'.unpack('B4B4') returns an array ["0100", ""] , since the first directive consumes char A

+5
source

Source: https://habr.com/ru/post/943495/


All Articles