Strange Behavior in Packed Ruby Strings

I am embarrassed by some ruby ​​behavior. Take a look at the following code:

[127].pack("C") == "\x7f" # => true 

It makes sense. Now:

 [128].pack("C") # => "\x80" "\x80" # => "\x80" [128].pack("C") == "\x80" # => false 

The pack "C" parameter stands for 8-bit unsigned (unsigned char) , which must be fine in order to keep the value 128 . Also, both lines print the same thing, so why are they not equal? Is this related to coding?

I'm on ruby ​​2.0.0p247.

+6
source share
2 answers

This is incorrect because the encodings are different:

 [128].pack("C").encoding #=> #<Encoding:ASCII-8BIT> "\x80".encoding #=> #<Encoding:UTF-8> 

(using ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux] )

In ruby ​​2.0, the default encoding for strings is UTF-8, but somehow pack returns an ASCII 8-bit encoded string.

Why is [127].pack('C') == "\x79" true then?

However, [127].pack('C') == "\x79" is true , because for code points 0 to 127 ASCII and UTF-8 are not different. This is examined by comparing ruby ​​strings (see the rubinius source code ):

 def ==(other) [...] return false unless @num_bytes == other.bytesize return false unless Encoding.compatible?(self, other) return @data.compare_bytes(other.__data__, @num_bytes, other.bytesize) == 0 end 

mri c-source is similar, but harder to understand.

We observe that the comparison checks for compatible encoding. Try the following:

 Encoding.compatible?([127].pack("C"), "\x79") #=> #<Encoding:ASCII-8BIT> Encoding.compatible?([128].pack("C"), "\x80") #=> nil 

We see that starting at code point 128, the comparison returns false , even when both strings consist of the same bytes.

+5
source

In Ruby 1.9, the default encoding of the source file is US-ASCII . Starting with Ruby 2.0, the default encoding has changed to UTF-8 . String literals such as "\x80" are always encoded using the encoding of the source file that contains them.

However, the encoding [128].pack("C") is ASCII-8BIT .

So, [128].pack("C") == "\x80" - false in Ruby 2.0, and true in Ruby 1.9

Putting #coding:some_encoding on the first line of the source file (or right after shebang) can change the default encoding of the source code.

 #coding:ascii puts([128].pack("C") == "\x80") 

Print true in Ruby 2.0.

+1
source

Source: https://habr.com/ru/post/958068/


All Articles