DataOutputStream outStream;
You probably don't want the DataOutputStream to write an RTF file. DataOutputStream is for writing binary structures to a file, but RTF is textual. Typically, an OutputStreamWriter setting the appropriate character set in the constructor will be a way of writing to text files.
outStream.writeBytes (strJapanese);
In particular, this fails because writeBytes does write bytes, even if you pass a string to it. A much more suitable data type would be byte[] , but this is only one of the places where Java handling of bytes and characters gets confused. The way it converts your string to bytes is simply to take the lower eight bits of each block of UTF-16 code and discard the rest. This results in ISO-8859-1 encoding with distorted nonsense for all characters that do not exist in ISO-8859-1.
byte[] b = strJapanese.getBytes("UTF-8"); String output = new String(b);
This is really nothing useful. You encode into UTF-8 bytes and decode it back to a string using the default encoding. It is almost always a mistake to touch the default encoding, since it is unpredictable for different machines.
outStream.writeUTF(strJapanese);
It would be better to hit UTF-8 spelling, but it's still not quite right as it uses Java encoding with modified UTF-8, and, more importantly, RTF files do not actually support UTF-8 and shouldn't really directly include any non-ASCII character.
Traditionally, characters other than ASCII from 128 to the top should be written as hexadecimal bytes, such as \'80 , and the encoding for them is indicated, if any, in the fonts \fcharset and \cpg screens, which are very, very annoying, and does not offer UTF-8 as an option.
In more modern RTF, you get \u1234x escape sequences, as in Dubbler's answer (+1). Each escape code encodes one UTF-16 code block, which corresponds to Java char , so itβs not too difficult to use a regular expression - replace all non-ASCII characters with their escaped variants.
This is supported by Word 97 and later, but some other tools may ignore Unicode and return to the x replacement character.
RTF is not a very nice format.