This problem can be partially solved (if not completely) using the special nl2br() function:
function nl2br_special($string){ // Step 1: Add <br /> tags for each line-break $string = nl2br($string); // Step 2: Remove the actual line-breaks $string = str_replace("\n", "", $string); $string = str_replace("\r", "", $string); // Step 3: Restore the line-breaks that are inside <pre></pre> tags if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){ foreach($match as $a){ foreach($a as $b){ $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string); } } } // Step 4: Removes extra <br /> tags // Before <pre> tags $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string); // After </pre> tags $string = str_replace("</pre><br /><br />", '</pre><br />', $string); // Arround <ul></ul> tags $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string); $string = str_replace("</ul><br /><br />", '</ul><br />', $string); // Inside <ul> </ul> tags $string = str_replace("<ul><br />", '<ul>', $string); $string = str_replace("<br /></ul>", '</ul>', $string); // Arround <ol></ol> tags $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string); $string = str_replace("</ol><br /><br />", '</ol><br />', $string); // Inside <ol> </ol> tags $string = str_replace("<ol><br />", '<ol>', $string); $string = str_replace("<br /></ol>", '</ol>', $string); // Arround <li></li> tags $string = str_replace("<br /><li>", '<li>', $string); $string = str_replace("</li><br />", '</li>', $string); return $string; }
This must be applied to the content before it is cleared by HTML. Never recycle purified content unless you know what you are doing.
Note that since each line break and double line breaks are already saved, you should not use the AutoFormat.AutoParagraph function to clear the HTML:
// Process line-breaks $string = nl2br_special($string); // Initiate HTML Purifier config $purifier_config = HTMLPurifier_Config::createDefault(); $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4'); //$purifier_config->set('AutoFormat.AutoParagraph', true); // Make sure to NOT use this // Initiate HTML Purifier $purifier = new HTMLPurifier($purifier_config); // Purify the content! $string = $purifier->purify($string);
What is it!
In addition, since the resolution of basic HTML tags was originally intended to improve the user experience without adding other markup syntax , you can allow users zip code and especially HTML code that will not be interpreted / deleted using an HTML cleaner.
HTML Cleaner currently allows you to send code, but requires complex CDATA markers:
<![CDATA[ Place code here ]]>
It is hard to remember and write. In order to simplify the work with the user as much as possible, I believe that it is best to allow users to add code by inserting it using simple <code> tags (for embedded code) and <pre> (for code blocks). Here's how to do it:
function custom_code_tag_callback($code) { return '<code>'.trim(htmlspecialchars($code[1])).'</code>'; } function custom_pre_tag_callback($code) { return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>'; }
Note that, like nl2br processing, this must be done before the content is cleared by HTML. Also keep in mind that if the user places the <code> or <pre> tags in his own published code, he closes the parent <code> or <pre> containing his code. This cannot be solved, as well as with the original CDATA markers or any markup, even those used in StackOverflow (for example, using a character in the code sample closes the code tag).
Finally, for a great user interface, there are other things that we might want to automate, such as links that we want to make clickable. Fortunately, this can be done using the HTML cleanup function AutoFormat.Linkify .
Here is the final code that includes everything for the final setup:
// === Declare functions === function nl2br_special($string){ // Step 1: Add <br /> tags for each line-break $string = nl2br($string); // Step 2: Remove the actual line-breaks $string = str_replace("\n", "", $string); $string = str_replace("\r", "", $string); // Step 3: Restore the line-breaks that are inside <pre></pre> tags if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){ foreach($match as $a){ foreach($a as $b){ $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string); } } } // Step 4: Removes extra <br /> tags // Before <pre> tags $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string); // After </pre> tags $string = str_replace("</pre><br /><br />", '</pre><br />', $string); // Arround <ul></ul> tags $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string); $string = str_replace("</ul><br /><br />", '</ul><br />', $string); // Inside <ul> </ul> tags $string = str_replace("<ul><br />", '<ul>', $string); $string = str_replace("<br /></ul>", '</ul>', $string); // Arround <ol></ol> tags $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string); $string = str_replace("</ol><br /><br />", '</ol><br />', $string); // Inside <ol> </ol> tags $string = str_replace("<ol><br />", '<ol>', $string); $string = str_replace("<br /></ol>", '</ol>', $string); // Arround <li></li> tags $string = str_replace("<br /><li>", '<li>', $string); $string = str_replace("</li><br />", '</li>', $string); return $string; } function custom_code_tag_callback($code) { return '<code>'.trim(htmlspecialchars($code[1])).'</code>'; } function custom_pre_tag_callback($code) { return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>'; } // === Process user input === // Process line-breaks $string = nl2br_special($string); // Allow simple <code> or <pre> tags for posting code $string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string); $string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string); // Initiate HTML Purifier config $purifier_config = HTMLPurifier_Config::createDefault(); $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4'); $purifier_config->set('AutoFormat.Linkify', true); // Make links clickable //$purifier_config->set('HTML.TargetBlank', true); // Uncomment if you want links to open new tabs //$purifier_config->set('AutoFormat.AutoParagraph', true); // Leave this commented as it conflicts with nl2br // Initiate HTML Purifier $purifier = new HTMLPurifier($purifier_config); // Purify the content! $string = $purifier->purify($string);
Hooray!