I would like to write a clojure function that takes a string in one encoding and converts it to another. The iconv library does this.
For example, let's look at the "è" symbol. In ISO-8859-1 ( http://www.ascii-code.com/ ), e8
as hex. In UTF-8 ( http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%C3%A8&mode=char ), c3 a8
.
So, let's say we have iso.txt that contains our letter and EOL:
$ hexdump iso.txt 0000000 e8 0a 0000002
Now we can convert it to UTF-8 as follows:
$ iconv -f ISO-8859-1 -t UTF-8 iso.txt | hexdump 0000000 c3 a8 0a 0000003
How do I write something equivalent in clojure? I am happy to use any external libraries, but I don’t know where I would go to find them. Looking around, I couldn't figure out how to use libiconv on the JVM, but is there probably an alternative?
Edit
After reading the Alex link in the comment, it is so simple and so cool:
user> (new String (byte-array 2 (map unchecked-byte [0xc3 0xa8])) "UTF-8") "è" user> (new String (byte-array 1 [(unchecked-byte 0xe8)]) "ISO-8859-1") "è"
spike source share