Rendering an XML Document with Multiple Languages

I have an XML page with some elements in different languages ​​- in Arabic, English, Chinese, Japanese. What encoding format should I choose for this? If I try to display XML using XSL (using utf-8 or ISO-8859-6 or ISO-2022-JP), I get this error:

An invalid character was found in the text content.

How to solve it or decide?

Thanks.

+4
source share
3 answers

UTF-8 is the only encoding that can handle all of these alphabets. It is also the default encoding for XML and the only encoding that makes sense for a modern application. (In any case, for storage / posting, for internal processing, the type of your language string will most likely be UTF-16 or 32.)

It seems that due to an error in the input file, a problem arose rather than a problem with the choice of output encoding. It may have been encoded in something other than UTF-8, but forgot to include the <?xml encoding?> Declaration in it. Or maybe there is an incorrect ISO-2202-JP escape sequence? (This is the horror of coding.)

You should try loading the input file into something that parses XML (like Firefox or IE) and see what errors, if any, are occurring.

(You cannot mix encodings in one XML file. If you spit out strings from different sources in XML, you have already lost. How is this XML generated?)

+2
source

Where exactly is the error found? It looks like the XML itself may have an invalid character (for example, a control character between U + 0000 and U + 001F, other than \ r, \ t and \ n IIRC). You will probably see this when loading XML into any decent XML editor (or programmatically).

In addition, UTF-8, as a rule, is a good choice of coding - it is less efficient than UTF-16 for characters of the Far East, mind you. Both UTF-16 and UTF-8 allow you to display all Unicode characters (using surrogate pairs in UTF-16 for characters outside the base multilingual plane).

+1
source

UTF-8 covers all of UCS2 (which most people refer to as Unicode characters), and as such should be appropriate. You still need to make sure that there are no embedded characters that should not be displayed in XML, such as < or > or non-printable characters

0
source

Source: https://habr.com/ru/post/1306350/


All Articles