HTML cleanup: converting <body> to <div>

Premise

I would like to use HTML Cleaner to convert <body> tags to <div> tags to preserve the inline style in <body> , for example <body style="background:color#000000;">Hi there.</body> will call to <div style="background:color#000000;">Hi there.</div> . I am looking at a combination of a custom tag and a TagTransform class.

Current setting

In my configuration section, I am doing this now:

 $htmlDef = $this->configuration->getHTMLDefinition(true); // defining the element to avoid triggering 'Element 'body' is not supported' $bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core'); $bodyElem->excludes = array('body' => true); // add the transformation rule $htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div'); 

..., as well as the <body> and style attributes (both class and id ) via configuration directives (they are part of a working large list that is parsed for HTML.AllowedElements and HTML.AllowedAttributes ).

I disabled definition caching.

 $config->set('Cache.DefinitionImpl', null); 

Unfortunately, in this setting, it seems that HTMLPurifier_TagTransform_Simple never called the transform() method.

HTML.Parent?

I assume that the culprit is my HTML.Parent , which is set to 'div' , since naturally the <div> does not allow the child element of the <body> . However setting HTML.Parent to 'html' disables me:

ErrorException: you cannot use an unrecognized element as a parent

Adding ...

 $htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core'); $htmlElem->excludes = array('html' => true); 

... gets rid of this error message, but still does not convert the tag - it is deleted instead.

Adding ...

 $htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core'); $htmlElem->excludes = array('html' => true); 

... also does nothing, because it tells me the error message:

 ErrorException: Trying to get property of non-object [...]/library/HTMLPurifier/Strategy/FixNesting.php:237 [...]/library/HTMLPurifier/Strategy/Composite.php:18 [...]/library/HTMLPurifier.php:181 [...] 

Now I'm still tuning in with the latter option, trying to figure out the exact syntax I need to provide, but if someone knows how to help me based on my own past experience, I would appreciate any pointers to the right.

HTML.TidyLevel?

As the only other culprit, I can imagine it, my HTML.TidyLevel set to 'heavy' . I have yet to try all the possible constellations, but so far it has no meaning.

(Since I only touched on this secondly, I am struggling to remember which constellations I have already tried so that I do not list them here, but since I lack confidence, I would not miss something that I did or wrong report something. I could edit this section later when I do some special testing!)

Full configuration

My configuration data is stored in JSON and then parsed in an HTML cleaner. Here's the file:

 { "CSS" : { "MaxImgLength" : "800px" }, "Core" : { "CollectErrors" : true, "HiddenElements" : { "script" : true, "style" : true, "iframe" : true, "noframes" : true }, "RemoveInvalidImg" : false }, "Filter" : { "ExtractStyleBlocks" : true }, "HTML" : { "MaxImgLength" : 800, "TidyLevel" : "heavy", "Doctype" : "XHTML 1.0 Transitional", "Parent" : "html" }, "Output" : { "TidyFormat" : true }, "Test" : { "ForceNoIconv" : true }, "URI" : { "AllowedSchemes" : { "http" : true, "https" : true, "mailto" : true, "ftp" : true }, "DisableExternalResources" : true } } 

( URI.Base , URI.Munge and Cache.SerializerPath also installed, but I removed them in this paste. Also, HTML.Parent caveat: As already mentioned, usually set to 'div' .)

+4
source share
2 answers

This code is the reason that what you are doing is not working:

  / **
  * Takes a string of HTML (fragment or document) and returns the content
  * @todo Consider making protected
  * /
 public function extractBody ($ html) {
     $ matches = array ();
     $ result = preg_match ('! <body [^>] *> (. *) </body>! is', $ html, $ matches);
     if ($ result) {
         return $ matches [1];
     } else {
         return $ html;
     }
 }

You can disable it using% Core.ConvertDocumentToFragment as false; if the rest of your code is error free, it should work right from there. I do not think your bodyElem definition is necessary. J

+3
source

It would be much easier to do:

 $search = array('<body', 'body>'); $replace = array('<div', 'div>'); $html = '<body style="background:color#000000;">Hi there.</body>'; echo str_replace($search, $replace, $html); >> '<div style="background:color#000000;">Hi there.</div>'; 
+2
source

Source: https://habr.com/ru/post/1308782/


All Articles