HTML cleanup: converting <body> to <div>
Premise
I would like to use HTML Cleaner to convert <body> tags to <div> tags to preserve the inline style in <body> , for example <body style="background:color#000000;">Hi there.</body> will call to <div style="background:color#000000;">Hi there.</div> . I am looking at a combination of a custom tag and a TagTransform class.
Current setting
In my configuration section, I am doing this now:
$htmlDef = $this->configuration->getHTMLDefinition(true); // defining the element to avoid triggering 'Element 'body' is not supported' $bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core'); $bodyElem->excludes = array('body' => true); // add the transformation rule $htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div'); ..., as well as the <body> and style attributes (both class and id ) via configuration directives (they are part of a working large list that is parsed for HTML.AllowedElements and HTML.AllowedAttributes ).
I disabled definition caching.
$config->set('Cache.DefinitionImpl', null); Unfortunately, in this setting, it seems that HTMLPurifier_TagTransform_Simple never called the transform() method.
HTML.Parent?
I assume that the culprit is my HTML.Parent , which is set to 'div' , since naturally the <div> does not allow the child element of the <body> . However setting HTML.Parent to 'html' disables me:
ErrorException: you cannot use an unrecognized element as a parent
Adding ...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core'); $htmlElem->excludes = array('html' => true); ... gets rid of this error message, but still does not convert the tag - it is deleted instead.
Adding ...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core'); $htmlElem->excludes = array('html' => true); ... also does nothing, because it tells me the error message:
ErrorException: Trying to get property of non-object [...]/library/HTMLPurifier/Strategy/FixNesting.php:237 [...]/library/HTMLPurifier/Strategy/Composite.php:18 [...]/library/HTMLPurifier.php:181 [...] Now I'm still tuning in with the latter option, trying to figure out the exact syntax I need to provide, but if someone knows how to help me based on my own past experience, I would appreciate any pointers to the right.
HTML.TidyLevel?
As the only other culprit, I can imagine it, my HTML.TidyLevel set to 'heavy' . I have yet to try all the possible constellations, but so far it has no meaning.
(Since I only touched on this secondly, I am struggling to remember which constellations I have already tried so that I do not list them here, but since I lack confidence, I would not miss something that I did or wrong report something. I could edit this section later when I do some special testing!)
Full configuration
My configuration data is stored in JSON and then parsed in an HTML cleaner. Here's the file:
{ "CSS" : { "MaxImgLength" : "800px" }, "Core" : { "CollectErrors" : true, "HiddenElements" : { "script" : true, "style" : true, "iframe" : true, "noframes" : true }, "RemoveInvalidImg" : false }, "Filter" : { "ExtractStyleBlocks" : true }, "HTML" : { "MaxImgLength" : 800, "TidyLevel" : "heavy", "Doctype" : "XHTML 1.0 Transitional", "Parent" : "html" }, "Output" : { "TidyFormat" : true }, "Test" : { "ForceNoIconv" : true }, "URI" : { "AllowedSchemes" : { "http" : true, "https" : true, "mailto" : true, "ftp" : true }, "DisableExternalResources" : true } } ( URI.Base , URI.Munge and Cache.SerializerPath also installed, but I removed them in this paste. Also, HTML.Parent caveat: As already mentioned, usually set to 'div' .)
This code is the reason that what you are doing is not working:
/ **
* Takes a string of HTML (fragment or document) and returns the content
* @todo Consider making protected
* /
public function extractBody ($ html) {
$ matches = array ();
$ result = preg_match ('! <body [^>] *> (. *) </body>! is', $ html, $ matches);
if ($ result) {
return $ matches [1];
} else {
return $ html;
}
}
You can disable it using% Core.ConvertDocumentToFragment as false; if the rest of your code is error free, it should work right from there. I do not think your bodyElem definition is necessary. J