PHP: strip_tags - remove only certain tags (and their contents)?

I use the strip_tags() function, but I need to remove some tags (and all of their contents).

eg:

 <div> <p class="test"> Test A </p> <span> Test B </span> <div> Test C </div> </div> 

Let's say I need to get rid of the P and SPAN tags and save only:

 <div> <div> Test C </div> </div> 

strip_tags expects the tags you want to use as the second parameter.

In this particular example, I could use striptags($html, "<div>"); but the html I am clearing and the tags that need to be removed are always different.

I was looking for a watch for a function that fits my needs, but could not find anything useful.

Any idea?

+6
source share
2 answers

Use regex. Something like this should work:

 $tags = array( 'p', 'span'); $text = preg_replace( '#<(' . implode( '|', $tags) . ')>.*?<\/$1>#s', '', $text); 

demo shows that nothing has replaced the required tags.

Note that you may need to tweak it more to compensate for spaces in tags or other unknowns that your example does not demonstrate.

This uses a regular expression to capture tags with or without attributes:

 '#<(' . implode( '|', $tags) . ')(?:[^>]+)?>.*?<\/$1>#s' 
+11
source

You say you use the Simple HTML DOM (Good! This is the right way to parse HTML). When I need to remove a tag and its contents, I:

 $rows = $html->find("span"); foreach ($rows as $row) { $row->outertext = ""; } $html->load($html->save()); 

The last line is required because the DOM gets confused after making the changes, so the whole DOM needs to be collapsed and then parsed again so that the changes are permanent (IMO, error in Simple HTML DOM).

The simple HTML DOM approach is safer and more stable than regex.

+1
source

Source: https://habr.com/ru/post/918808/


All Articles