I wrote a script that sends snippets of text to Google for translation, but sometimes the text, which is the html source code) will end up splitting in the middle of the html tag, and Google will return the code incorrectly.
I already know how to split a string into an array, but is there a better way to do this by ensuring that the output string does not exceed 5000 characters and does not break into a tag?
UPDATE: Thanks to the answer, this is the code I used in my project and it works great
function handleTextHtmlSplit($text, $maxSize) {
$niceHtml[] = '';
$pieces = preg_split('/(<[^>]*>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$currentPiece = 0;
foreach ($pieces as $piece) {
if (strlen($niceHtml[$currentPiece] . $piece) > $maxSize) {
$currentPiece += 1;
$niceHtml[$currentPiece] = '';
}
$niceHtml[$currentPiece] .= $piece;
}
return $niceHtml;
}
james source
share