Scroll the DOM tree recursively and remove unnecessary tags?

Question

Scroll the DOM tree recursively and remove unnecessary tags?

$tags = array(
    "applet" => 1,  
    "script" => 1
);

$html = file_get_contents("test.html");
$dom = new DOMdocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$body = $xpath->query("//body")->item(0);

I go through the "body" of the web page and delete all the unnecessary tags listed in the $ tags array, but I cannot find a way. So how can I do this?

+3

dom html php

Teiv Dec 30 '10 at 12:59

source share

2 answers

Would you consider an HTML cleaner ? starting with your own html cleanup, it’s just reinventing the wheel and not easy to complete.

In addition, the blacklist approach is also bad, see SO / why-use-a-whitelist-for-html-sanitizing

You may also be interested in reading how to configure allowed tags and attributes or a test demo of an HTML cleaner.

+6

dvb Dec 30 '10 at 13:53

source share

Epharion · Accepted Answer · 2010-12-30T13:50:35+0000

$tags = array(
    "applet" => 1,  
    "script" => 1
);

$html = file_get_contents("test.html");
$dom = new DOMdocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

for($i=0; $i<count($tags); ++$i) {
   $list = $xpath->query("//".$tags[$i]);
   for($j=0; $j<$list->length; ++$j) {
      $node = $list->item($j);
      if ($node == null) continue;
      $node->parentNode->removeChild($node);
   }
}

$string = $dom->saveXML();

Something like that.

Scroll the DOM tree recursively and remove unnecessary tags?

More articles: