PHP: display the first 500 characters of HTML

I have huge HTML code in a PHP variable, for example:

$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....'; 

I want to display only the first 500 characters of this code. This character counter must consider text in HTML tags and must exclude HTMl tags and attributes when measuring length. but when cropping code, it should not affect the DOM structure of HTML code.

Are there any working or working examples?

+6
source share
4 answers

If you need its text, you can do it with the following words:

 substr(strip_tags($html_code),0,500); 
+4
source

Ohhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

http://www.php.net/manual/en/class.domdocument.php

then take the text from the whole node document (like DOMnode http://www.php.net/manual/en/class.domnode.php )

This will not be entirely correct, but hopefully this will lead you on the right path. Try something like:

  $html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....'; $dom = new DOMDocument(); $dom->loadHTML($html_code); $text_to_strip = $dom->textContent; $stripped = mb_substr($text_to_strip,0,500); echo "$stripped"; // The Sameple text.Another sample text..... 

change ok ... this should work. just verified locally

edit2

Now that I understand that you want to keep tags, but limit the text, let's see. You will want to combine the contents until you get up to 500 characters. It will probably take a few changes and passes for me to qualify, but I hope I can help. (sorry, I can not pay close attention)

The first case is when the text is less than 500 characters. Nothing to worry about. Starting with the above code, we can do the following.

  if (strlen($stripped) > 500) { // this is where we do our work. $characters_so_far = 0; foreach ($dom->child_nodes as $ChildNode) { // should check if $ChildNode->hasChildNodes(); // probably put some of this stuff into a function $characters_in_next_node += str_len($ChildNode->textcontent); if ($characters_so_far+$characters_in_next_node > 500) { // remove the node // try using // $ChildNode->parentNode->removeChild($ChildNode); } $characters_so_far += $characters_in_next_node } // $final_out = $dom->saveHTML(); } else { $final_out = $html_code; } 
+3
source

I am pasting under a php class, I wrote long , but I know that it works. its not quite what you need, since it is about words instead of the number of characters, but I find it pretty close, and someone might find it useful.

  class HtmlWordManipulator { var $stack = array(); function truncate($text, $num=50) { if (preg_match_all('/\s+/', $text, $junk) <= $num) return $text; $text = preg_replace_callback('/(<\/?[^>]+\s+[^>]*>)/','_truncateProtect', $text); $words = 0; $out = array(); $text = str_replace('<',' <',str_replace('>','> ',$text)); $toks = preg_split('/\s+/', $text); foreach ($toks as $tok) { if (preg_match_all('/<(\/?[^\x01>]+)([^>]*)>/',$tok,$matches,PREG_SET_ORDER)) foreach ($matches as $tag) $this->_recordTag($tag[1], $tag[2]); $out[] = trim($tok); if (! preg_match('/^(<[^>]+>)+$/', $tok)) { if (!strpos($tok,'=') && !strpos($tok,'<') && strlen(trim(strip_tags($tok))) > 0) { ++$words; } else { /* echo '<hr />'; echo htmlentities('failed: '.$tok).'<br /)>'; echo htmlentities('has equals: '.strpos($tok,'=')).'<br />'; echo htmlentities('has greater than: '.strpos($tok,'<')).'<br />'; echo htmlentities('strip tags: '.strip_tags($tok)).'<br />'; echo str_word_count($text); */ } } if ($words > $num) break; } $truncate = $this->_truncateRestore(implode(' ', $out)); return $truncate; } function restoreTags($text) { foreach ($this->stack as $tag) $text .= "</$tag>"; return $text; } private function _truncateProtect($match) { return preg_replace('/\s/', "\x01", $match[0]); } private function _truncateRestore($strings) { return preg_replace('/\x01/', ' ', $strings); } private function _recordTag($tag, $args) { // XHTML if (strlen($args) and $args[strlen($args) - 1] == '/') return; else if ($tag[0] == '/') { $tag = substr($tag, 1); for ($i=count($this->stack) -1; $i >= 0; $i--) { if ($this->stack[$i] == $tag) { array_splice($this->stack, $i, 1); return; } } return; } else if (in_array($tag, array('p', 'li', 'ul', 'ol', 'div', 'span', 'a'))) $this->stack[] = $tag; else return; } } 

truncate is what you want, and you pass it the html and the number of words you want it to be truncated. it ignores html when counting words, but then iterates over everything in html, even closing trailing tags due to truncation.

Please do not judge me for the complete lack of oop principles. I was young and stupid.

edit:

therefore, it turns out that use is more like this:

 $content = $manipulator->restoreTags($manipulator->truncate($myHtml,$numOfWords)); 

stupid design decision. allowed me to inject html inside private tags.

+2
source

I am not ready to code the real solution, but if someone wants to, here is what I will do (in pseudo-PHP):

 $html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....'; $aggregate = ''; $document = XMLParser($html_code); foreach ($document->getElementsByTagName('*') as $element) { $aggregate .= $element->text(); // This is the text, not HTML. It doesn't // include the children, only the text // directly in the tag. } 
+1
source

Source: https://habr.com/ru/post/886206/


All Articles