I am looking for a way to split a string containing HTML into two halves. Requirements:
- Split string into multiple characters
- You cannot divide the middle of a word
- Do not include HTML characters when calculating where to split the string
For example, enter the following line:
<p>This is a test string that contains <strong>HTML</strong> tags and text content. This string needs to be split without slicing through the <em>middle</em> of a word and must preserve the validity of the HTML, i.e. not split in the middle of a tag, and make sure closing tags are respected correctly.</p>
Say I want to split at position char 39, in the middle of an HTML word (not counting html), I would like the function to split the line into the following two parts:
<p>This is a test string that contains <strong>HTML</strong></p>
and
<p>tags and text content. This string needs to be split without slicing through the <em>middle</em> of a word and must preserve the validity of the HTML, i.e. not split in the middle of a tag, and make sure closing tags are respected correctly.</p>
Please note that in the above two examples, I would require HTML validity, so the closing tags </strong>and were added </p>. Also in the second half, a start tag was added <p>, which was closed at the end of the line.
StackOverflow, HTML, , .
function printTruncated($maxLength, $html)
{
$printedLength = 0;
$position = 0;
$tags = array();
while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
{
list($tag, $tagPosition) = $match[0];
$str = substr($html, $position, $tagPosition - $position);
if ($printedLength + strlen($str) > $maxLength)
{
print(substr($str, 0, $maxLength - $printedLength));
$printedLength = $maxLength;
break;
}
print($str);
$printedLength += strlen($str);
if ($tag[0] == '&')
{
print($tag);
$printedLength++;
}
else
{
$tagName = $match[1][0];
if ($tag[1] == '/')
{
$openingTag = array_pop($tags);
assert($openingTag == $tagName);
print($tag);
}
else if ($tag[strlen($tag) - 2] == '/')
{
print($tag);
}
else
{
print($tag);
$tags[] = $tagName;
}
}
$position = $tagPosition + strlen($tag);
}
if ($printedLength < $maxLength && $position < strlen($html))
print(substr($html, $position, $maxLength - $printedLength));
while (!empty($tags))
printf('</%s>', array_pop($tags));
}