How to clear the form field for an XML attribute that will contain valid UTF8 characters?

I'm struggling a bit with this. I have a multilingual web application that outputs XML at some point. This XML can contain any language, so my approach to disinfection is to prohibit some characters that violate the XML from the insert. This is wrapping as much as possible in CDATA, but I have a ton of content in the attributes. I don’t want to prohibit special characters, because all valid characters, such as brackets, periods, dashes, ticks and apostrophes, are used all the time and they work.

What is the best way to cross out all characters that violate the XML attribute, but leave the languages ​​intact?

UPDATE:
I found: http://en.wikipedia.org/wiki/CDATA#CDATA-type_attribute_value , which states that I can describe the attribute as a CDATA section using DTD; however, this is not as it seems.

<?xml version="1.0" ?> <!DOCTYPE foo [ <!ELEMENT foo EMPTY> <!ATTLIST foo a CDATA #REQUIRED> ]> <foo a="&bull;"><![CDATA[ &bull; ]]> </foo> 

Any validator will complain that the bull is not an entity in the attribute. If you remove the attribute, it will be valid. Also, I heard that schemas are the way to go, so if something like this is given above, but use an XML schema instead, it will be awesome.

Thanks!

+6
source share
2 answers

it's really

 <?xml version="1.0" ?> <!DOCTYPE foo [ <!ELEMENT foo EMPTY> <!ATTLIST foo a CDATA #REQUIRED> ]> <foo a="&amp;bull;"><![CDATA[ &bull; ]]> </foo> 

you can translate special characters to html objects with

 htmlentities($str); 

and reversing with

 html_entity_decode($str); 

see http://www.php.net/manual/en/function.htmlentities.php

see also "html metacharacters"

+2
source

All you have to do is wrap them with <!CDATA[ ]]> tags. You can also add htmlentities.

 attr="<!CDATA[' . htmlentities($value) . ']]>" 
-1
source

Source: https://habr.com/ru/post/916594/


All Articles