If it contains html (note that this is a pretty reliable solution):
$string = '<p>foo<b>bar</b></p>'; $keyword = 'foo'; $dom = new DomDocument(); $dom->loadHtml($string); $xpath = new DomXpath($dom); $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]'); foreach ($elements as $element) { foreach ($element->childNodes as $child) { if (!$child instanceof DomText) continue; $fragment = $dom->createDocumentFragment(); $text = $child->textContent; $stubs = array(); while (($pos = stripos($text, $keyword)) !== false) { $fragment->appendChild(new DomText(substr($text, 0, $pos))); $word = substr($text, $pos, strlen($keyword)); $highlight = $dom->createElement('span'); $highlight->appendChild(new DomText($word)); $highlight->setAttribute('class', 'highlight'); $fragment->appendChild($highlight); $text = substr($text, $pos + strlen($keyword)); } if (!empty($text)) $fragment->appendChild(new DomText($text)); $element->replaceChild($fragment, $child); } } $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
Results in:
<p><span class="highlight">foo</span><b>bar</b></p>
And with the help of:
$string = '<body><p>foobarbaz<b>bar</b></p></body>'; $keyword = 'bar';
You get (split into several lines for readability):
<p>foo <span class="highlight">bar</span> baz <b> <span class="highlight">bar</span> </b> </p>
Beware of minor decisions (like regex or str_replace ), since highlighting something like βdivβ tends to completely destroy your HTML ... This will only βhighlightβ the lines in the body, never inside the tag ...
Change Since you need Google-style results, here is one way to do this:
function getKeywordStubs($string, array $keywords, $maxStubSize = 10) { $dom = new DomDocument(); $dom->loadHtml($string); $xpath = new DomXpath($dom); $results = array(); $maxStubHalf = ceil($maxStubSize / 2); foreach ($keywords as $keyword) { $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]'); $replace = '<span class="highlight">'.$keyword.'</span>'; foreach ($elements as $element) { $stub = $element->textContent; $regex = '#^.*?((\w*\W*){'. $maxStubHalf.'})('. preg_quote($keyword, '#'). ')((\w*\W*){'. $maxStubHalf.'}).*?$#ims'; preg_match($regex, $stub, $match); var_dump($regex, $match); $stub = preg_replace($regex, '\\1\\3\\4', $stub); $stub = str_ireplace($keyword, $replace, $stub); $results[] = $stub; } } $results = array_unique($results); return $results; }
Ok, so this returns an array of matches with $maxStubSize words around it (namely, up to half that number before, and half after) ...
So, given the line:
<p>a whole <b>bunch of</b> text <a>here for</a> us to foo bar baz replace out from this string <b>bar</b> </p>
Calling getKeywordStubs($string, array('bar', 'bunch')) will result in:
array(4) { [0]=> string(75) "here for us to foo <span class="highlight">bar</span> baz replace out from " [3]=> string(34) "<span class="highlight">bar</span>" [4]=> string(62) "a whole <span class="highlight">bunch</span> of text here for " [7]=> string(39) "<span class="highlight">bunch</span> of" }
So then you can create your result by sorting the strlen list and then selecting the two longest matches ... (suppose php 5.3 +):
usort($results, function($str1, $str2) { return strlen($str2) - strlen($str1); }); $description = implode('...', array_slice($results, 0, 2));
Result:
here for us to foo <span class="highlight">bar</span> baz replace out...a whole <span class="highlight">bunch</span> of text here for
I hope this helps ... (I feel it's a little ... bloated ... I'm sure there are better ways to do this, but here is one way) ...