In order to have valid final XML text, you need to avoid all XML objects and have text written in the same encoding as the XML document processing instruction states this ("encoding" in the string <?xml ). Accented characters should not be escaped if they are encoded as a document.
However, in many situations, simply escaping input using htmlspecialchars can lead to double encoded objects (for example, é will become &eacute; ), so I suggest decoding html objects first:
function xml_escape($s) { $s = html_entity_decode($s, ENT_QUOTES, 'UTF-8'); $s = htmlspecialchars($s, ENT_QUOTES, 'UTF-8', false); return $s; }
Now you need to make sure that all accented characters are valid in the encoding of the XML document. I highly recommend always coding XML output in UTF-8, as not all XML parsers respect the encoding of XML document processing. If your input can be obtained from a different encoding, try using utf8_encode() .
In this case, a special case that can be obtained from one of these encodings: ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252 and KOI8-R-PHP they are all the same, but in them there are slight differences, some of which even iconv() cannot handle it. I could solve this problem only by complementing the utf8_encode() behavior:
function encode_utf8($s) { $cp1252_map = array( "\xc2\x80" => "\xe2\x82\xac", "\xc2\x82" => "\xe2\x80\x9a", "\xc2\x83" => "\xc6\x92", "\xc2\x84" => "\xe2\x80\x9e", "\xc2\x85" => "\xe2\x80\xa6", "\xc2\x86" => "\xe2\x80\xa0", "\xc2\x87" => "\xe2\x80\xa1", "\xc2\x88" => "\xcb\x86", "\xc2\x89" => "\xe2\x80\xb0", "\xc2\x8a" => "\xc5\xa0", "\xc2\x8b" => "\xe2\x80\xb9", "\xc2\x8c" => "\xc5\x92", "\xc2\x8e" => "\xc5\xbd", "\xc2\x91" => "\xe2\x80\x98", "\xc2\x92" => "\xe2\x80\x99", "\xc2\x93" => "\xe2\x80\x9c", "\xc2\x94" => "\xe2\x80\x9d", "\xc2\x95" => "\xe2\x80\xa2", "\xc2\x96" => "\xe2\x80\x93", "\xc2\x97" => "\xe2\x80\x94", "\xc2\x98" => "\xcb\x9c", "\xc2\x99" => "\xe2\x84\xa2", "\xc2\x9a" => "\xc5\xa1", "\xc2\x9b" => "\xe2\x80\xba", "\xc2\x9c" => "\xc5\x93", "\xc2\x9e" => "\xc5\xbe", "\xc2\x9f" => "\xc5\xb8" ); $s=strtr(utf8_encode($s), $cp1252_map); return $s; }