Split an HTML line into two lines without cutting the word and saving HTML in PHP

I am looking for a way to split a string containing HTML into two halves. Requirements:

  • Split string into multiple characters
  • You cannot divide the middle of a word
  • Do not include HTML characters when calculating where to split the string

For example, enter the following line:

<p>This is a test string that contains <strong>HTML</strong> tags and text content. This string needs to be split without slicing through the <em>middle</em> of a word and must preserve the validity of the HTML, i.e. not split in the middle of a tag, and make sure closing tags are respected correctly.</p>

Say I want to split at position char 39, in the middle of an HTML word (not counting html), I would like the function to split the line into the following two parts:

<p>This is a test string that contains <strong>HTML</strong></p>

and

<p>tags and text content. This string needs to be split without slicing through the <em>middle</em> of a word and must preserve the validity of the HTML, i.e. not split in the middle of a tag, and make sure closing tags are respected correctly.</p>

Please note that in the above two examples, I would require HTML validity, so the closing tags </strong>and were added </p>. Also in the second half, a start tag was added <p>, which was closed at the end of the line.

StackOverflow, HTML, , .

function printTruncated($maxLength, $html)
{
    $printedLength = 0;
    $position = 0;
    $tags = array();

    while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
    {
        list($tag, $tagPosition) = $match[0];

        // Print text leading up to the tag.
        $str = substr($html, $position, $tagPosition - $position);
        if ($printedLength + strlen($str) > $maxLength)
        {
            print(substr($str, 0, $maxLength - $printedLength));
            $printedLength = $maxLength;
            break;
        }

        print($str);
        $printedLength += strlen($str);

        if ($tag[0] == '&')
        {
            // Handle the entity.
            print($tag);
            $printedLength++;
        }
        else
        {
            // Handle the tag.
            $tagName = $match[1][0];
            if ($tag[1] == '/')
            {
                // This is a closing tag.

                $openingTag = array_pop($tags);
                assert($openingTag == $tagName); // check that tags are properly nested.

                print($tag);
            }
            else if ($tag[strlen($tag) - 2] == '/')
            {
                // Self-closing tag.
                print($tag);
            }
            else
            {
                // Opening tag.
                print($tag);
                $tags[] = $tagName;
            }
        }

        // Continue after the tag.
        $position = $tagPosition + strlen($tag);
    }

    // Print any remaining text.
    if ($printedLength < $maxLength && $position < strlen($html))
        print(substr($html, $position, $maxLength - $printedLength));

    // Close any open tags.
    while (!empty($tags))
        printf('</%s>', array_pop($tags));
}
+3
1

, , " HTML - "

, .

, ,

, , ( , p-, ), , , , ? divs? , , ?

, , , : ( -, ) * - * no-tags-at-all * * , p-, / - , p- - "", * , .

. , , , 90% , , , - ,

, , .

+4

Source: https://habr.com/ru/post/1739466/


All Articles