I have a small search engine that does its job, and I want to highlight the results. I thought that all this worked until the set of keywords that I used today blew it out of the water.
The problem is that preg_replace () is looping through replacements, and later replacements replace the text that I inserted in the previous ones. Embarrassed? Here is my pseudo-function:
public function highlightKeywords ($data, $keywords = array()) { $find = array(); $replace = array(); $begin = "<span class=\"keywordHighlight\">"; $end = "</span>"; foreach ($keywords as $kw) { $find[] = '/' . str_replace("/", "\/", $kw) . '/iu'; $replace[] = $begin . "\$0" . $end; } return preg_replace($find, $replace, $data); }
OK, so it works when searching for "fred" and "dagg", but unfortunately when searching for "class" and "lass" and "as" it encounters a real problem when it selects "Joseph Class Group"
Joseph <span class="keywordHighlight">Cl</span><span <span c<span <span class="keywordHighlight">cl</span>ass="keywordHighlight">lass</span>="keywordHighlight">c<span <span class="keywordHighlight">cl</span>ass="keywordHighlight">lass</span></span>="keywordHighlight">ass</span> Group
How do I get the latest replacements to work only with components other than HTML, but also allow marking the whole match? for example, if I were looking for "cla" and "lass", I would like the "class" to be completely highlighted, because it contains search terms even if they overlap, and the selection that was applied to the first match has a class "in it, but not to stand out.
Sigh.
I would rather use a PHP solution than jQuery (or any client-side).
Note. I tried to sort the keywords by length, first by making long ones, but this means that cross-queries are not highlighted, which means βclaβ and βlassβ only part of the word βclassβ will stand out and it still killed the replacement tags :(
EDITOR: I mixed up, starting with pencil and paper and wild wanderings, and came up with a very unglazed code to solve this problem. This is not great, so suggestions for trimming / speeding this up will still be highly appreciated :)
public function highlightKeywords ($data, $keywords = array()) { $find = array(); $replace = array(); $begin = "<span class=\"keywordHighlight\">"; $end = "</span>"; $hits = array(); foreach ($keywords as $kw) { $offset = 0; while (($pos = stripos($data, $kw, $offset)) !== false) { $hits[] = array($pos, $pos + strlen($kw)); $offset = $pos + 1; } } if ($hits) { usort($hits, function($a, $b) { if ($a[0] == $b[0]) { return 0; } return ($a[0] < $b[0]) ? -1 : 1; }); $thisthat = array(0 => $begin, 1 => $end); for ($i = 0; $i < count($hits); $i++) { foreach ($thisthat as $key => $val) { $pos = $hits[$i][$key]; $data = substr($data, 0, $pos) . $val . substr($data, $pos); for ($j = 0; $j < count($hits); $j++) { if ($hits[$j][0] >= $pos) { $hits[$j][0] += strlen($val); } if ($hits[$j][1] >= $pos) { $hits[$j][1] += strlen($val); } } } } } return $data; }