This is incorrect because the encodings are different:
[128].pack("C").encoding #=> #<Encoding:ASCII-8BIT> "\x80".encoding #=> #<Encoding:UTF-8>
(using ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux] )
In ruby 2.0, the default encoding for strings is UTF-8, but somehow pack returns an ASCII 8-bit encoded string.
Why is [127].pack('C') == "\x79" true then?
However, [127].pack('C') == "\x79" is true , because for code points 0 to 127 ASCII and UTF-8 are not different. This is examined by comparing ruby strings (see the rubinius source code ):
def ==(other) [...] return false unless @num_bytes == other.bytesize return false unless Encoding.compatible?(self, other) return @data.compare_bytes(other.__data__, @num_bytes, other.bytesize) == 0 end
mri c-source is similar, but harder to understand.
We observe that the comparison checks for compatible encoding. Try the following:
Encoding.compatible?([127].pack("C"), "\x79") #=> #<Encoding:ASCII-8BIT> Encoding.compatible?([128].pack("C"), "\x80") #=> nil
We see that starting at code point 128, the comparison returns false , even when both strings consist of the same bytes.
tessi source share