Differentiation between XHTML and HTML with PHP DOMDocument

I want to manipulate HTML and XHTML documents with a PHP DOM implementation. I am using the DOMDocument-> loadHTML () method to load content.

To find out if the downloaded content is XHTML or HTML. DOMDocument has a doctype object that contains the DOCTYPE declaration from the document itself. So far I have been thinking of comparing $ dom-> doctype-> publicId, which contains strings like "- // W3C // DTD HTML 4.01 // ENtext / html"

Is there a better way that everyone can think of?

Edit:

Sorry if my question was a bit unclear. I updated this question as it may have been confusing. But in order to make this clear now: this question is not about handling HTML with the PHP DOM at all, or XHTML is not good or bad.

+3
source share
1 answer

If you are booting from an external source, you can check the MIME file type and see if it exists application/xhtml+xml; if so, then it is definitely XHTML (of course, it can lie and serve with this type, but with terribly distorted markup). Otherwise, if it is text/html, then it will be parsed as an HTML tag soup. The validity of the actual markup aside, the doctype declaration is your best best way to find out if the content (or claims to be content) is HTML or XHTML.

As you say, you can check the public identifier and / or URI and determine the type from there.

+1
source

Source: https://habr.com/ru/post/1783755/


All Articles