Convert shielded XML objects to UTF-8

So, I have this UTF-8 line in an XML file:

Horrible place. ☠☠☠

And when I pass it to an external application, funny characters are returned as XML objects:

Horrible place. ☠☠☠

In Ruby, how do I convert this string back to UTF-8? This is probably a really simple solution, but I can't find anything in standard libraries; eg. CGI.unescapeHTML(which work well for things like >) seem to completely ignore them.

ree-1.8.7-2010.02 > CGI.unescapeHTML('>')
 => ">" 
ree-1.8.7-2010.02 > CGI.unescapeHTML('☠')
 => "☠" 
+3
source share
2 answers

Well, since it is encoded in XML, I would go for an XML parser:

require 'nokogiri'

frag = 'Horrible place. ☠☠☠'
doc = Nokogiri::XML.fragment(frag)
puts doc.text
# >> Horrible place. ☠☠☠
+4
source

CGI.unescapeHTML ; , , , .

, :

File.open("d:\\11.txt", 'w') {|f| f.write(CGI.unescapeHTML('☠')) } # => ☠
+2

Source: https://habr.com/ru/post/1782701/


All Articles