Convert utf-8 to unicode in ruby
Ruby 1.9+ is much better equipped with Unicode than 1.8.7, so I highly recommend running under 1.9.2, if at all possible.
Part of the problem is that 1.8 did not realize that a UTF-8 or Unicode character could be more than one byte long. 1.9 it understands and introduces things like String # each_char.
require 'iconv'
# encoding: UTF-8
RUBY_VERSION # => "1.9.2"
"龅".encoding # => #<Encoding:UTF-8>
"龅".each_char.entries # => ["龅"]
Iconv.iconv("unicode","utf-8","龅").to_s # =>
# ~> -:8:in `iconv': invalid encoding ("unicode", "utf-8") (Iconv::InvalidEncoding)
# ~> from -:8:in `<main>'
To get a list of available encodings with Iconv, do:
require 'iconv'
puts Iconv.list
This is a long list, so I will not add it here.