Is there an alternative to PHP strip_tags ()

strip_tags() documentation tells us that all tags except the second parameter are deleted. The implementation of this function is completely opposite to its name. It should be called strip_all_tags_except() .

Forget the name and come to what I want to ask. I want the functionality to remove only those tags that I mention in the second parameter. i.e. I want the following to separate the <iframe><script><style><embed><object> tags and allow everyone else.

 my_strip_tags($data,'<iframe><script><style><embed><object>'); 

This is quite the opposite of what strip_tags() does.

How to do it?

+6
source share
4 answers

Updated 2012-06-23; major security flaw.

Here is a class from another project that should do what you are looking for:

 final class Filter { private function __construct() {} const SafeTags = 'a abbr acronym address b bdo big blockquote br caption center cite code col colgroup dd del dfn dir div dl dt em font h1 h2 h3 h4 h5 h6 hr i img ins kbd legend li ol p pre qs samp small span strike strong sub sup table tbody td tfoot th thead tr tt u ul var article aside figure footer header nav section rp rt ruby dialog hgroup mark time'; const SafeAttributes = 'href src title alt type rowspan colspan lang'; const URLAttributes = 'href src'; public static function HTML($html) { # Get array representations of the safe tags and attributes: $safeTags = explode(' ', self::SafeTags); $safeAttributes = explode(' ', self::SafeAttributes); $urlAttributes = explode(' ', self::URLAttributes); # Parse the HTML into a document object: $dom = new DOMDocument(); $dom->loadHTML('<div>' . $html . '</div>'); # Loop through all of the nodes: $stack = new SplStack(); $stack->push($dom->documentElement); while($stack->count() > 0) { # Get the next element for processing: $element = $stack->pop(); # Add all the element child nodes to the stack: foreach($element->childNodes as $child) { if($child instanceof DOMElement) { $stack->push($child); } } # And now, we do the filtering: if(!in_array(strtolower($element->nodeName), $safeTags)) { # It not a safe tag; unwrap it: while($element->hasChildNodes()) { $element->parentNode->insertBefore($element->firstChild, $element); } # Finally, delete the offending element: $element->parentNode->removeChild($element); } else { # The tag is safe; now filter its attributes: for($i = 0; $i < $element->attributes->length; $i++) { $attribute = $element->attributes->item($i); $name = strtolower($attribute->name); if(!in_array($name, $safeAttributes) || (in_array($name, $urlAttributes) && substr($attribute->value, 0, 7) !== 'http://')) { # Found an unsafe attribute; remove it: $element->removeAttribute($attribute->name); $i--; } } } } # Finally, return the safe HTML, minus the DOCTYPE, <html> and <body>: $html = $dom->saveHTML(); $start = strpos($html, '<div>'); $end = strrpos($html, '</div>'); return substr($html, $start + 5, $end - $start - 5); } } 
+3
source

This should not happen at all.

strip_tags used only when used without any parameters. Otherwise, you will have XSS in any permitted tag.

In fact, your concern should be not only tags, but also attributes. So, use an HTML cleaner instead.

+3
source

I usually work with htmLawed lib, you can use it to filter, protect and sanitize HTML

http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/more.htm

+1
source

I think the strip_tags () function matches its name. All this is a matter of perspective. :-) Without the second parameter, it breaks all the tags. The second parameter provides exceptions for the main functions.

It seems that you want strip_some_tags() .

How to do this with regex?

 function strip_some_tags($input, $taglist) { $output=$input; foreach ($taglist as $thistag) { if (preg_match('/^[az]+$/i', $thistag)) { $patterns=array( '/' . "<".$thistag."\/?>" . '/', '/' . "<\/".$thistag.">" . '/' ); } else if (preg_match('/^<[az]+>$/i', $thistag)) { $patterns=array( '/' . str_replace('>', "?>", $thistag) . '/', '/' . str_replace('<', "<\/?", $thistag) . '/' ); } else { $patterns=array(); } $output=preg_replace($patterns, "", $output); } return $output; } $to_strip=array( "iframe", "script", "style", "embed", "object" ); $sampletext="Testing. <object>Am I an object?</object>\n"; print strip_some_tags($sampletext, $to_strip); 

Return:

 Testing. Am I an object? 

Of course, this just separates the tags, not the things between them. Is this what you want? You did not indicate in your question.

0
source

Source: https://habr.com/ru/post/911204/


All Articles