Örwin Örwin Örwin

What makes my XML break?

I have the following XML code.

<firstname> <default length="6">Örwin</default> <short>Örwin</short> <shorter>Örwin</shorter> <shortest> .</shortest> </firstname> 

Why does the content of the "shortest" node break? It should just be "..." instead of tiring. The XML is encoded in UTF-8 encoding, and the function that processes the output of this node also writes the contents of the “short” and “short” ones. Where "Ö" is clearly visible.

+4
source share
2 answers

I assume XML is not UTF-8 encoded. Please show the bytes in the <shortest> element in the raw file ... I suspect you will find that they are not an encoded character. If you could show a short but complete program that generates this XML from valid input, this would be very helpful. (It is preferable to say which platform is this too :)

EDIT: Something very strange is happening in this file. Here are the hexadecimal values ​​for the “shorter” and “shortest” values:

In short: C3 96 72 77 69 63

Shortest: EF BF BD 2E

Now, “C3 96” is a valid UTF-8 encoding for U + 00D6, which is the “Latin capital letter O with diaresis” as you want.

However, EF BF BD is the UTF-8 encoding for U + FFFD, which is the "replacement character" - definitely not what you want. (2E is just an ASCII point.)

So this is really valid UTF-8, but it does not contain the characters you want. Again, you should study what created the file ...

+17
source

XML parses elements within tags, since any element can contain nested elements. Thus, your “ö” may break the parsing.

Put your data in the CDATA tag, for example: http://www.w3schools.com/XML/xml_cdata.asp

-3
source

Source: https://habr.com/ru/post/1286682/


All Articles