Ruby: convert encoded character to a valid UTF-8 character

Ruby will not play well with UTF-8 strings. I pass the data in an XML file, and although the XML document is listed as UTF-8, it treats the ascii encoding (two bytes per character) as separate characters.

I started to encode input strings in '\ uXXXX' format, but I can't figure out how to convert this to the actual UTF-8 character. I searched all this on this site, and google - to no avail, and my disappointment is pretty high. I am using Ruby 1.8.6

Basically, I want to convert the string '\ u03a3' → "Σ".

I have had:

data.gsub /\\u([a-zA-Z0-9]{4})/,  $1.hex.to_i.chr

Which of course gives the error "931 of char".

Thanks. Tim

+3
source share
3

:

[0x50].pack("U")

0x50 - utf8 char.

+5

- , Ruby UTF-8 ? , . - , , , . , .

, Iconv.

Σ \u03a3.\uXXXX JSON, XML. \uXXXX, - JSON, .

+2

Ruby ( , 1.8.6) Unicode. Integer#chr ASCII, 255 ('\377').

:

irb(main):001:0> 255.chr
=> "\377"
irb(main):002:0> 256.chr
RangeError: 256 out of char range
        from (irb):2:in `chr'
        from (irb):2

Ruby 1.9. chr docs ASCII, - 255.

Or you can try investigating ruby-unicode . I have never tried this myself, so I don’t know how well this will help.

Otherwise, I don’t think you can do what you want in Ruby at this time.

+1
source

Source: https://habr.com/ru/post/1723044/


All Articles