Convert utf-8 to unicode in ruby

Question

Convert utf-8 to unicode in ruby

UTF-8 "龅" is E9BE85, and Unicode is U + 9F85. The following code did not work properly:

irb(main):004:0> "龅"
=> "\351\276\205"
irb(main):005:0> Iconv.iconv("unicode","utf-8","龅").to_s
=> "\377\376\205\237"

PS: I am using Ruby1.8.7.

+3

ruby unicode utf-8 iconv

pierrotlefou Feb 11 '11 at 5:05

source share

3 answers

Ruby 1.9+ is much better equipped with Unicode than 1.8.7, so I highly recommend running under 1.9.2, if at all possible.

Part of the problem is that 1.8 did not realize that a UTF-8 or Unicode character could be more than one byte long. 1.9 it understands and introduces things like String # each_char.

require 'iconv'

# encoding: UTF-8

RUBY_VERSION # => "1.9.2"
"龅".encoding # => #<Encoding:UTF-8>
"龅".each_char.entries # => ["龅"]
Iconv.iconv("unicode","utf-8","龅").to_s # => 

# ~> -:8:in `iconv': invalid encoding ("unicode", "utf-8") (Iconv::InvalidEncoding)
# ~>    from -:8:in `<main>'

To get a list of available encodings with Iconv, do:

require 'iconv'
puts Iconv.list

This is a long list, so I will not add it here.

+4

the tin man 11 . '11 5:16

:

"% 04x" % "龅".unpack( "U *" ) [0]
= > "9f85"

+4

Edson Lima 01 . '11 17:17

pierrotlefou · Accepted Answer · 2011-02-11T05:27:23+0000

Must use UNICODEBING//as target encoding

irb(main):014:0> Iconv.iconv("UNICODEBIG//","utf-8","龅")[0].each_byte {|b| puts b.to_s(16)}
9f
85
=> "\237\205"

Convert utf-8 to unicode in ruby

More articles: