Trim a string with HTML tags in it

I have a string containing HTML tags. I am looking for a piece of code that will allow me to trim this line:

  • 100 characters long
  • do not contain image tags ( <img /> ).
  • include other HTML tags (except image tag)
  • 100 characters must not contain spaces or HTML tag characters.

For example, the line:

 <img>Something</img><b>Just an Example</b> Plain Text <br><a href="#">stackoverflow</a> 

Thus, the result should be:

Just an example. The usual stack stackflowflow (its link).

As a result, we have about 35 words (other than white).

I tried the solution from this question , but did not get the required result. Any help would be appreciated.

+4
source share
1 answer

How about function. Here mine is AbstractHTMLContents . It has two parameters:

  • HTML content input
  • limit.

Here is the code:

 function AbstractHTMLContents($html, $maxLength=100){ mb_internal_encoding("UTF-8"); $printedLength = 0; $position = 0; $tags = array(); $newContent = ''; $html = $content = preg_replace("/<img[^>]+\>/i", "", $html); while ($printedLength < $maxLength && preg_match('{</?([az]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position)) { list($tag, $tagPosition) = $match[0]; // Print text leading up to the tag. $str = mb_strcut($html, $position, $tagPosition - $position); if ($printedLength + mb_strlen($str) > $maxLength){ $newstr = mb_strcut($str, 0, $maxLength - $printedLength); $newstr = preg_replace('~\s+\S+$~', '', $newstr); $newContent .= $newstr; $printedLength = $maxLength; break; } $newContent .= $str; $printedLength += mb_strlen($str); if ($tag[0] == '&') { // Handle the entity. $newContent .= $tag; $printedLength++; } else { // Handle the tag. $tagName = $match[1][0]; if ($tag[1] == '/') { // This is a closing tag. $openingTag = array_pop($tags); assert($openingTag == $tagName); // check that tags are properly nested. $newContent .= $tag; } else if ($tag[mb_strlen($tag) - 2] == '/'){ // Self-closing tag. $newContent .= $tag; } else { // Opening tag. $newContent .= $tag; $tags[] = $tagName; } } // Continue after the tag. $position = $tagPosition + mb_strlen($tag); } // Print any remaining text. if ($printedLength < $maxLength && $position < mb_strlen($html)) { $newstr = mb_strcut($html, $position, $maxLength - $printedLength); $newstr = preg_replace('~\s+\S+$~', '', $newstr); $newContent .= $newstr; } // Close any open tags. while (!empty($tags)) { $newContent .= sprintf('</%s>', array_pop($tags)); } return $newContent; } 

This seems to give the expected result.

+5
source

Source: https://habr.com/ru/post/1386312/


All Articles