How to fine-tune html objects?

I like the following:

$mytext="that's really "confusing" and <absolutly> silly"; echo substr($mytext,0,6); 

The output in this case would be: that&# instead of that's

What I want is to count the html objects as 1 character and then substr, because I always end up with broken html or some obscure characters at the end of the text.

Please do not suggest me html decode it, then substr then encode it, I want a clean method :)

thanks

+4
source share
6 answers

There are two ways to do this:

  • You can decode HTML objects, substr() and then encode; or

  • You can use regex.

(1) uses html_entity_decode() and htmlentities() :

 $s = html_entity_decode($mytext); $sub = substr($s, 0, 6); echo htmlentities($sub); 

(2) might look something like this:

 if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) { echo $match[0]; } 

What it is: find up to 5 occurrences of the previous expression from the beginning of the line. Previous expression:

  • any character that is not an ampersand; or

  • an ampersand followed by everything, even a semi-colony (including an HTML object).

This is not ideal, so I would prefer (1).

+4
source
 function encoded_substr($string, $param, $param2){ $s = html_entity_decode($string); $sub = substr($s, $param, $param2); return htmlentities($sub); } 

There, I copied cletus' code into a function for you. Now you can call a very simple 3-line function with 1 line of code. If it is not “pure,” I am confused by what “pure” means.

+3
source

Note that some characters violate the proposed decoding + encoding if you use substr() .

Example

 $string=html_entity_decode("Workin’ on my Fitness…In the Backyard."); echo $string; echo substr($string,0,25); echo htmlentities(substr($string,0,25)); 

Conclusion:

  • Work out at my fitness ... In the backyard.
  • Work out on my fitness
  • (empty line)

Decision

Use mb_substr() .

 echo mb_substr($string,0,25); echo htmlentities(mb_substr($string,0,25)); 

Conclusion:

  • Work on my Fitness ... In
  • Work ’ on my fitness … IN
+2
source

Try using the following encoding functions.

 <?php $mytext="that&#039;s really &quot;confusing&quot; and &lt;absolutly&gt; silly"; echo limit_text($tamil_var,6); function limit_text($text,$limit){ preg_match_all("/&(.*)\;/U", $text, $pat_array); $additional=0; foreach ($pat_array[0] as $key => $value) { if($key <$limit){$additional += (strlen($value)-1);} } $limit+=$additional; if(strlen($text)>$limit){ $text = substr( $text,0,$limit ); $text = substr( $text,0,-(strlen(strrchr($text,' '))) ); } return $text; } ?> 
+1
source

Well, there is only one pure method: Do not use entities at all.
There is more than one reason for substr substring. It can only be used for output.
So, fine-tune first, then encode.

0
source

Here is the fix for the syntax error code, use mb_substr to avoid surprises such as an html object with fewer characters, or character counting doesn't work as it should, in my case Sábado becomes Sá:

 function encoded_substr($string, $param, $param2){ $s = html_entity_decode($string); $sub = mb_substr($s, $param, $param2); return htmlentities($sub); } 
0
source

Source: https://habr.com/ru/post/1307210/


All Articles