What makes my XML break?
I have the following XML code.
<firstname> <default length="6">Örwin</default> <short>Örwin</short> <shorter>Örwin</shorter> <shortest> .</shortest> </firstname> Why does the content of the "shortest" node break? It should just be "..." instead of tiring. The XML is encoded in UTF-8 encoding, and the function that processes the output of this node also writes the contents of the “short” and “short” ones. Where "Ö" is clearly visible.
I assume XML is not UTF-8 encoded. Please show the bytes in the <shortest> element in the raw file ... I suspect you will find that they are not an encoded character. If you could show a short but complete program that generates this XML from valid input, this would be very helpful. (It is preferable to say which platform is this too :)
EDIT: Something very strange is happening in this file. Here are the hexadecimal values for the “shorter” and “shortest” values:
In short: C3 96 72 77 69 63
Shortest: EF BF BD 2E
Now, “C3 96” is a valid UTF-8 encoding for U + 00D6, which is the “Latin capital letter O with diaresis” as you want.
However, EF BF BD is the UTF-8 encoding for U + FFFD, which is the "replacement character" - definitely not what you want. (2E is just an ASCII point.)
So this is really valid UTF-8, but it does not contain the characters you want. Again, you should study what created the file ...
XML parses elements within tags, since any element can contain nested elements. Thus, your “ö” may break the parsing.
Put your data in the CDATA tag, for example: http://www.w3schools.com/XML/xml_cdata.asp