The best way to delete all but 5 predefined HTML objects with PHP is for XHTML5 output

I am currently experimenting with XHTML5 delivery. I am currently delivering XHTML 1.1 Strict on the page I'm working on. That is, I do for capable browsers. For those who do not accept XML encoded data, I am returning to strict HTML4.1.

In an experiment using HTML5 for both when delivered as HTML5, everything works more or less as expected. The first problem that occurs when delivering XHTML5, however, is related to HTML objects. FF4 sais ü - an undefined object. Because there is no HTML5 DTD.

I read that the HTML5 wiki currently recommends:

Do not use entity references in XHTML (except for 5 predefined objects: & ' " and ' )

I need < , > in certain places. Therefore, my question is the best way in PHP to decode all but the five objects mentioned above. html_entity_decode() decodes all of them, so is there any reasonable way to exclude some of them?

UPDATE:

At the moment, I went with a simple replacement / replacement back, so if there really is no elegant way to resolve this issue for my immediate needs.

 function non_html5_entity_decode($string) { $string = str_replace("&",'@@@AMP', str_replace("'",'@@@APOS', str_replace("<",'@@@LT', str_replace(">",'@@@GT', str_replace(""",'@@@QUOT',$string))))); $string = html_entity_decode($string); $string = str_replace('@@@AMP',"&", str_replace('@@@APOS',"'", str_replace('@@@LT',"<", str_replace('@@@GT',">", str_replace('@@@QUOT',""",$string))))); return $string; } 
+3
source share
2 answers

PAY ATTENTION for universal conversions: using html_entity_decode with default parameters does not delete all named objects , only a few defined by the old HTML 4.01 standard. Thus, objects such as © (& copy;) are converted; but some like + (& plus;), not. To convert ALL named objects, use ENT_HTML5 in the second parameter (!).

Also, if the final code is not UTF8, it cannot get higher (up to 255) names, for example 𝒜 (& Ascr;) thar - 119964> 255.

So, to convert "ALL POSSIBLE NAMES", you MUST use html_entity_decode($s,ENT_HTML5,'UTF-8') , but this is only valid with PHP5.3 + where the ENT_HTML5 flag was implemented.

In the special case of this question, the ENT_NOQUOTES flag should also be used instead of the standard ENT_COMPAT, so html_entity_decode($s,ENT_HTML5|ENT_NOQUOTES,'UTF-8') must be used


PS (edited): thanks @BoltClock remember PHP5.3 +.

+3
source

I think html_entity_decode() followed by htmlspecialchars() is the easiest way.

It does not convert ' although - for this you need to do htmlspecialchars() first and then convert ' at &apos .

0
source

Source: https://habr.com/ru/post/1495163/


All Articles