Cut text without destroying html tags

Is there a way to do this without writing my own function?

For instance:

$text = 'Test <span><a>something</a> something else</span>.'; $text = cutText($text, 2, null, 20, true); //result: Test <span><a>something</a></span> 

I need to make this function indestructible

My problem is similar to this thread but I need a better solution. I would like to leave the nested tags intact.

So far my algorithm is:

 function cutText($content, $max_words, $max_chars, $max_word_len, $html = false) { $len = strlen($content); $res = ''; $word_count = 0; $word_started = false; $current_word = ''; $current_word_len = 0; if ($max_chars == null) { $max_chars = $len; } $inHtml = false; $openedTags = array(); for ($i = 0; $i<$max_chars;$i++) { if ($content[$i] == '<' && $html) { $inHtml = true; } if ($inHtml) { $max_chars++; } if ($html && !$inHtml) { if ($content[$i] != ' ' && !$word_started) { $word_started = true; $word_count++; } $current_word .= $content[$i]; $current_word_len++; if ($current_word_len == $max_word_len) { $current_word .= '- '; } if (($content[$i] == ' ') && $word_started) { $word_started = false; $res .= $current_word; $current_word = ''; $current_word_len = 0; if ($word_count == $max_words) { return $res; } } } if ($content[$i] == '<' && $html) { $inHtml = true; } } return $res; } 

But of course this will not work. I thought about remembering open tags and closing them if they were not closed, but maybe there is a better way?

+6
source share
3 answers

Ok, I decided this thing.

I divided this into two parts. First html text cutting without breaking:

 function cutHtml($content, $max_words, $max_chars, $max_word_len) { $len = strlen($content); $res = ''; $word_count = 0; $word_started = false; $current_word = ''; $current_word_len = 0; if ($max_chars == null) { $max_chars = $len; } $inHtml = false; $openedTags = array(); $i = 0; while ($i < $max_chars) { //skip any html tags if ($content[$i] == '<') { $inHtml = true; while (true) { $res .= $content[$i]; $i++; while($content[$i] == ' ') { $res .= $content[$i]; $i++; } //skip any values if ($content[$i] == "'") { $res .= $content[$i]; $i++; while(!($content[$i] == "'" && $content[$i-1] != "\\")) { $res .= $content[$i]; $i++; } } //skip any values if ($content[$i] == '"') { $res .= $content[$i]; $i++; while(!($content[$i] == '"' && $content[$i-1] != "\\")) { $res .= $content[$i]; $i++; } } if ($content[$i] == '>') { $res .= $content[$i]; $i++; break;} } $inHtml = false; } if (!$inHtml) { while($content[$i] == ' ') { $res .= $content[$i]; $letter_count++; $i++; } //skip spaces $word_started = false; $current_word = ''; $current_word_len = 0; while (!in_array($content[$i], array(' ', '<', '.', ','))) { if (!$word_started) { $word_started = true; $word_count++; } $current_word .= $content[$i]; $current_word_len++; if ($current_word_len == $max_word_len) { $current_word .= '-'; $current_word_len = 0; } $i++; } if ($letter_count > $max_chars) { return $res; } if ($word_count < $max_words) { $res .= $current_word; $letter_count += strlen($current_word); } if ($word_count == $max_words) { $res .= $current_word; $letter_count += strlen($current_word); return $res; } } } return $res; } 

And the following thing closes closed tags:

 function cleanTags(&$html) { $count = strlen($html); $i = -1; $openedTags = array(); while(true) { $i++; if ($i >= $count) break; if ($html[$i] == '<') { $tag = ''; $closeTag = ''; $reading = false; //reading whole tag while($html[$i] != '>') { $i++; while($html[$i] == ' ') $i++; //skip any spaces (need to be idiot proof) if (!$reading && $html[$i] == '/') { //closing tag $i++; while($html[$i] == ' ') $i++; //skip any spaces $closeTag = ''; while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string $reading = true; $html[$i] = strtolower($html[$i]); //tags to lowercase $closeTag .= $html[$i]; $i++; } $c = count($openedTags); if ($c > 0 && $openedTags[$c-1] == $closeTag) array_pop($openedTags); } if (!$reading) //read only tag while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string $reading = true; $html[$i] = strtolower($html[$i]); //tags to lowercase $tag .= $html[$i]; $i++; } //skip any values if ($html[$i] == "'") { $i++; while(!($html[$i] == "'" && $html[$i-1] != "\\")) { $i++; } } //skip any values if ($html[$i] == '"') { $i++; while(!($html[$i] == '"' && $html[$i-1] != "\\")) { $i++; } } if ($reading && $html[$i] == '/') { //self closed tag $tag = ''; break; } } if (!empty($tag)) $openedTags[] = $tag; } } while (count($openedTags) > 0) { $tag = array_pop($openedTags); $html .= "</$tag>"; } } 

This is not idiotic proof, but tinymce will clean this thing up, so no further cleaning is needed.

It may be a little long, but I do not think that he will eat a lot of resources, and he should be faster than regular expression.

+1
source

This works fine for me:

 function trimContent ($str, $trimAtIndex) { $beginTags = array(); $endTags = array(); for($i = 0; $i < strlen($str); $i++) { if( $str[$i] == '<' ) $beginTags[] = $i; else if($str[$i] == '>') $endTags[] = $i; } foreach($beginTags as $k=>$index) { // Trying to trim in between tags. Trim after the last tag if( ( $trimAtIndex >= $index ) && ($trimAtIndex <= $endTags[$k]) ) { $trimAtIndex = $endTags[$k]; } } return substr($str, 0, $trimAtIndex); } 
+2
source

Try something like this

  function cutText($inputText, $start, $length) { $temp = $inputText; $res = array(); while (strpos($temp, '>')) { $ts = strpos($temp, '<'); $te = strpos($temp, '>'); if ($ts > 0) $res[] = substr($temp, 0, $ts); $res[] = substr($temp, $ts, $te - $ts + 1); $temp = substr($temp, $te + 1, strlen($temp) - $te); } if ($temp != '') $res[] = $temp; $pointer = 0; $end = $start + $length - 1; foreach ($res as &$part) { if (substr($part, 0, 1) != '<') { $l = strlen($part); $p1 = $pointer; $p2 = $pointer + $l - 1; $partx = ""; if ($start <= $p1 && $end >= $p2) $partx = ""; else { if ($start > $p1 && $start <= $p2) $partx .= substr($part, 0, $start-$pointer); if ($end >= $p1 && $end < $p2) $partx .= substr($part, $end-$pointer+1, $l-$end+$pointer); if ($partx == "") $partx = $part; } $part = $partx; $pointer += $l; } } return join('', $res); } 

Parameters:

  • $ inputText - text input
  • $ start - position of the first character
  • $ length - as the menu characters we want to remove


Example # 1 - Deleting the first three characters
  $text = 'Test <span><a>something</a> something else</span>.'; $text = cutText($text, 0, 3); var_dump($text); 

Exit (Deleted "Tes")

 string(47) "t <span><a>something</a> something else</span>." 

Delete the first 10 characters

  $text = cutText($text, 0, 10); 

Exit (removed "Test somet")

 string(40) "<span><a>hing</a> something else</span>." 

Example 2 - Removing internal characters - "es" from "Test"

  $text = cutText($text, 1, 2); 

Exit

 string(48) "Tt <span><a>something</a> something else</span>." 

Removing "thing something el"

  $text = cutText($text, 9, 18); 

Exit

 string(32) "Test <span><a>some</a>se</span>." 

Hope this helps.

Well, maybe this is not the best solution, but all that I can do at the moment.

+1
source

Source: https://habr.com/ru/post/885540/


All Articles