PHP: How to save line breaks with nl2br () using an HTML cleaner?

Problem: When using HTML Cleaner to process user-entered content, line breaks do not translate into <br /> tags.

Consider the following custom content:

 Lorem ipsum dolor sit amet. This is another line. <pre> .my-css-class { color: blue; } </pre> Lorem ipsum: <ul> <li>Lorem</li> <li>Ipsum</li> <li>Dolor</li> </ul> Dolor sit amet, MyName 

When processed using an HTML cleaner, the foregoing changes as follows:

Lorem ipsum dolor sit amet. This is another line.

 .my-css-class { color: blue; } 

Lorem ipsum:

  • Lorem
  • Ipsum
  • Dolor
Dolor sit amet, MyName

As you can see, "MyName", which was intended for a single line by the user, appears along with the previous line.

How to fix?

Using the PHP function nl2br() , of course. However, new problems arise, whether we use it before or after cleaning the contents.

Here is an example of using nl2br () before clearing the HTML:

Lorem ipsum dolor sit amet.
This is another line.

 .my-css-class { color: blue; } 

Lorem ipsum:

  • Lorem
  • Ipsum
  • Dolor

Sit amet bits,
Myname

What happens is that nl2br () adds <br /> for each line break, so even those in the <pre> block are processed, as well as line breaks after each <li> .

What i tried

I tried the custom function nl2br () , which replaces line breaks with <br /> tags, and then removes all <br /> tags from <pre> blocks. It works fine, however the problem remains with <li> .

Trying to use the same approach for <ul> blocks will also remove all <br /> tags from child <li> elements, unless we use a more complex regular expression to remove the <br /> tags that are inside the <ul> elements, but outside of the <li> elements. But what about the nested <ul> in the <li> element? To deal with all of these situations, we would have to have an even more complex regex!

  • If this is the right approach, could you please help me with regex?
  • If this is not the right approach, how can I solve this problem? I am also open to alternatives to the HTML cleaner.

Other resources that I have already looked at:

+4
source share
2 answers

This problem can be partially solved (if not completely) using the special nl2br() function:

 function nl2br_special($string){ // Step 1: Add <br /> tags for each line-break $string = nl2br($string); // Step 2: Remove the actual line-breaks $string = str_replace("\n", "", $string); $string = str_replace("\r", "", $string); // Step 3: Restore the line-breaks that are inside <pre></pre> tags if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){ foreach($match as $a){ foreach($a as $b){ $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string); } } } // Step 4: Removes extra <br /> tags // Before <pre> tags $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string); // After </pre> tags $string = str_replace("</pre><br /><br />", '</pre><br />', $string); // Arround <ul></ul> tags $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string); $string = str_replace("</ul><br /><br />", '</ul><br />', $string); // Inside <ul> </ul> tags $string = str_replace("<ul><br />", '<ul>', $string); $string = str_replace("<br /></ul>", '</ul>', $string); // Arround <ol></ol> tags $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string); $string = str_replace("</ol><br /><br />", '</ol><br />', $string); // Inside <ol> </ol> tags $string = str_replace("<ol><br />", '<ol>', $string); $string = str_replace("<br /></ol>", '</ol>', $string); // Arround <li></li> tags $string = str_replace("<br /><li>", '<li>', $string); $string = str_replace("</li><br />", '</li>', $string); return $string; } 

This must be applied to the content before it is cleared by HTML. Never recycle purified content unless you know what you are doing.

Note that since each line break and double line breaks are already saved, you should not use the AutoFormat.AutoParagraph function to clear the HTML:

 // Process line-breaks $string = nl2br_special($string); // Initiate HTML Purifier config $purifier_config = HTMLPurifier_Config::createDefault(); $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4'); //$purifier_config->set('AutoFormat.AutoParagraph', true); // Make sure to NOT use this // Initiate HTML Purifier $purifier = new HTMLPurifier($purifier_config); // Purify the content! $string = $purifier->purify($string); 

What is it!


In addition, since the resolution of basic HTML tags was originally intended to improve the user experience without adding other markup syntax , you can allow users zip code and especially HTML code that will not be interpreted / deleted using an HTML cleaner.

HTML Cleaner currently allows you to send code, but requires complex CDATA markers:

 <![CDATA[ Place code here ]]> 

It is hard to remember and write. In order to simplify the work with the user as much as possible, I believe that it is best to allow users to add code by inserting it using simple <code> tags (for embedded code) and <pre> (for code blocks). Here's how to do it:

 function custom_code_tag_callback($code) { return '<code>'.trim(htmlspecialchars($code[1])).'</code>'; } function custom_pre_tag_callback($code) { return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>'; } // Don't require HTMLPurifier CDATA enclosing, instead allow simple <code> or <pre> tags $string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string); $string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string); 

Note that, like nl2br processing, this must be done before the content is cleared by HTML. Also keep in mind that if the user places the <code> or <pre> tags in his own published code, he closes the parent <code> or <pre> containing his code. This cannot be solved, as well as with the original CDATA markers or any markup, even those used in StackOverflow (for example, using a character in the code sample closes the code tag).

Finally, for a great user interface, there are other things that we might want to automate, such as links that we want to make clickable. Fortunately, this can be done using the HTML cleanup function AutoFormat.Linkify .

Here is the final code that includes everything for the final setup:

 // === Declare functions === function nl2br_special($string){ // Step 1: Add <br /> tags for each line-break $string = nl2br($string); // Step 2: Remove the actual line-breaks $string = str_replace("\n", "", $string); $string = str_replace("\r", "", $string); // Step 3: Restore the line-breaks that are inside <pre></pre> tags if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){ foreach($match as $a){ foreach($a as $b){ $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string); } } } // Step 4: Removes extra <br /> tags // Before <pre> tags $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string); // After </pre> tags $string = str_replace("</pre><br /><br />", '</pre><br />', $string); // Arround <ul></ul> tags $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string); $string = str_replace("</ul><br /><br />", '</ul><br />', $string); // Inside <ul> </ul> tags $string = str_replace("<ul><br />", '<ul>', $string); $string = str_replace("<br /></ul>", '</ul>', $string); // Arround <ol></ol> tags $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string); $string = str_replace("</ol><br /><br />", '</ol><br />', $string); // Inside <ol> </ol> tags $string = str_replace("<ol><br />", '<ol>', $string); $string = str_replace("<br /></ol>", '</ol>', $string); // Arround <li></li> tags $string = str_replace("<br /><li>", '<li>', $string); $string = str_replace("</li><br />", '</li>', $string); return $string; } function custom_code_tag_callback($code) { return '<code>'.trim(htmlspecialchars($code[1])).'</code>'; } function custom_pre_tag_callback($code) { return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>'; } // === Process user input === // Process line-breaks $string = nl2br_special($string); // Allow simple <code> or <pre> tags for posting code $string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string); $string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string); // Initiate HTML Purifier config $purifier_config = HTMLPurifier_Config::createDefault(); $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4'); $purifier_config->set('AutoFormat.Linkify', true); // Make links clickable //$purifier_config->set('HTML.TargetBlank', true); // Uncomment if you want links to open new tabs //$purifier_config->set('AutoFormat.AutoParagraph', true); // Leave this commented as it conflicts with nl2br // Initiate HTML Purifier $purifier = new HTMLPurifier($purifier_config); // Purify the content! $string = $purifier->purify($string); 

Hooray!

+4
source

maybe this will help.

 function custom_nl2br($html) { $pattern = "/<ul>(.*?)<\/ul>/s"; preg_match($pattern, $html, $matches); $html = nl2br(str_replace($matches[0], '[placeholder]', $html)); $html = str_replace('[placeholder]',$matches[0], $html); return $html; } 
0
source

Source: https://habr.com/ru/post/1491355/


All Articles