According to the HTML purifier smoketest , invalid URIs are sometimes discarded to leave the anchor tag without an attribute, e.g.
<a href="javascript:document.location='http://www.google.com/'">XSS</a> becomes <a>XSS</a>
..., and also from time to time it is reduced to a protocol, for example
<a href="http://1113982867/">XSS</a> becomes <a href="http:/">XSS</a>
While this is hassle-free, in fact, it's a little ugly. Instead of trying to remove them using regular expressions, I was hoping to use the native HTML capabilities of the Purifier library / injectors / plugins / whathaveyou.
Reference point: attribute handling
Conditionally remove attribute HTMLPurifier is very simple. Here the library offers the HTMLPurifier_AttrTransform class using the confiscateAttr() method.
Although I personally do not use the confiscateAttr() functionality, I use HTMLPurifier_AttrTransform as this thread to add target="_blank" to all anchors.
// more configuration stuff up here $htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true); $anchor = $htmlDef->addBlankElement('a'); $anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_Target(); // purify down here
HTMLPurifier_AttrTransform_Target is of course a very simple class.
class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform { public function transform($attr, $config, $context) {
This piece works like a charm, naturally.
Controls
Maybe I'm not squinting too much in HTMLPurifier_TagTransform , or I HTMLPurifier_TagTransform looking in the wrong place (s), or I don’t understand at all, but I can’t understand how to conditionally remove <strong> elements.
Say something like:
// more configuration stuff up here $htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true); $anchor = $htmlDef->addElementHandler('a'); $anchor->elem_transform_post[] = new HTMLPurifier_ElementTransform_Cull(); // add target as per 'point of reference' here // purify down here
With a Cull class extending something that has the ability to confiscateElement() , or comparable, in which I could check for the missing href attribute or href attribute with the contents of http:/ .
HTMLPurifier_Filter
I understand that I can create a filter, but the examples (Youtube.php and ExtractStyleBlocks.php) suggest that I will use regular expressions in it, which I would really avoid, if at all possible. I am hoping for a built-in or quasi-board solution that takes advantage of the superior features of the HTML Cleaner parser.
Returning null in the child class HTMLPurifier_AttrTransform , unfortunately, does not shorten it.
Does anyone have any clever ideas, or am I stuck with regexes? :)