You may not be aware of this, but DOMDocument can help you fix the HTML.
$html = "<div><h2>Hello world<h2><p>It 7Am where I live<p><div>"; libxml_use_internal_errors(true); $dom = new DOMDocument(); $dom->loadHTML('<root>' . $html . '</root>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $xpath = new DOMXPath($dom); foreach( $xpath->query('//*[not(node())]') as $node ) { $node->parentNode->removeChild($node); } echo substr($dom->saveHTML(), 6, -8);
Watch the IDEONE demo
Result: <div><h2>Hello world</h2><p>It 7Am where I live</p></div>
Note that clean XPath-based node cleanup is necessary because the DOM contains empty <h2></h2> , <p></p> and <div></div> tags after loading HTML into the DOM.
The <root> element is added at the beginning to make sure that we get the root element in order. Later we can send it using substr .
Flags LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD required so that no DTDs and other garbage are added to the DOM.
source share