How do I understand what kind of character this is?

Update: Obviously, these are control characters, not Unicode characters.

I am trying to parse an XML file with an odd character in it, which makes it invalid and makes my tools (Firefox, Nokogiri) complain.

This is what the character looks like in Firefox, and what it looks like when I copy and paste it into Textmate (obviously I'm on OS X).

crazy characters http://img.skitch.com/20090811-ghu43k5u9nhpcjmh443dpq76jp.preview.jpg

Instead of just cryptic icons and small gray diamonds, I would really like to know what these symbols are (for example, hexadecimal / decoding codes), but I'm not sure how to figure it out.

+1
ruby character-encoding
Aug 11 '09 at 15:48
source share
10 answers

I would save the page in Firefox to a file and pass it to hexdump -C . Find the HTML fragment around it in the ASCII part, then find the hexadecimal bytes. This is most likely UTF-8, so expect a multibyte code.

+4
Aug 11 '09 at 15:55
source share

Your screenshot is tiny, but does the Firefox sample contain a glyph with four hexadecimal characters? If so, this is the Unicode character code number. You can also hunt for this diamond glyph on the Unicode code chart or simply copy the diamond into a Google search and the character’s name should appear next to the top.

But the real question is how to handle Unicode input in your program. This must be done correctly if you are processing XML. Nokogiri - Ruby library? I am surprised to hear that it does not automatically process Unicode.

+4
Aug 11 '09 at 15:53
source share

The search query you are looking for is U+2603 or U2603 , explicitly substituting the numbers from your deplorably blurred “unknown glyph”. The first few results will concern this Unicode character.

+2
Aug 12 '09 at 18:20
source share

Copy it to emacs and run hexl-mode.

+1
Aug 11 '09 at 19:58
source share

Just open the file using hexeditor, for example xvi32 .

0
Aug 11 '09 at 15:54
source share

Open the hex file and extract the hexadecimal representation of the character. Then find the code http://unicode.org to find out the name of the symbol.

0
Aug 11 '09 at 15:56
source share

hexdump -c from the command line of the terminal you will see the character code.

0
Aug 11 '09 at 18:40
source share

Save the file and then use the terminal:

od (octal dump)

0
Aug 11 '09 at 18:40
source share

If you are using Vim, then move the cursor over the character and type ga to display the hex in the status area

0
Aug 12 '09 at 10:10
source share

you can download the Ruby hexdump extension for the String class and directly print the hexdump from Ruby:

 require 'hexdump' #... whatever you do in your program puts your_string.hexdump 
Result

looks like you get from hexdump -C in shell

Cm:

http://www.unixgods.org/~tilo/Ruby/hexdump.html

0
Oct 19 '11 at 6:25
source share



All Articles