Convert Erlang UTF-8 encoded string to java.lang.String

Question

Convert Erlang UTF-8 encoded string to java.lang.String

Java node receives an Erlang string encoded in UTF-8. Its class type is OtpErlangString . If I just execute .toString() or .stringValue() , then the result is java.lang.String having invalid code points (basically, every byte from the Erlang string is considered a separate character).

Now I want to use new String(bytes, "UTF-8") when creating a Java string, but how to get bytes from OtpErlangString ?

+4

java erlang unicode utf-8

Martin Dimitrov Jan 16 '12 at 10:01

source share

1 answer

Wacław borowiec · Answer 1 · 2012-01-19T07:51:23+0000

It is strange that you use OtpErlangString on the Java side when using UTF8 characters. I get an object of this type if I use only ASCII characters. If I add at least one UTF8 character, the resulting type will be OtpErlangList (which is logical, since strings are only ints lists in Erlang), and then I can use its stringValue () method. So after submitting an Erlang form line like:

 ( waco@host )8> {proc, java1@host } ! "ąćśźżęółńa". [261,263,347,378,380,281,243,322,324,97]

On Java node, I get and print it with:

 OtpErlangList l = (OtpErlangList) mbox.receive(); System.out.println(l.stringValue());

The output is correct:

 ąćśźżęółńa

However, if this is not the case in your situation, you can try to handle it by exposing the OtpErlangList view, for example. Adding an empty tuple as the very first element of the string list:

 ( waco@wborowiec )11> {proc, java1@wborowiec } ! [{}] ++ "ąćśźżęółńa". [{},261,263,347,378,380,281,243,322,324,97]

And on the Java side, something like:

 OtpErlangList l = (OtpErlangList) mbox.receive(); // get rid of an extra tuple OtpErlangObject[] strArr = Arrays.copyOfRange(l.elements(), 1, l.elements().length); OtpErlangList l2 = new OtpErlangList(strArr); System.out.println(l2.stringValue());

Convert Erlang UTF-8 encoded string to java.lang.String

More articles: