Highlight the keywords in the paragraph

I need to highlight a keyword in a paragraph, as google does in search results. Suppose I have a MySQL db with blog posts. When a user searches for a specific keyword, I want to return messages containing these keywords, but show only parts of the messages (the paragraph containing the search keyword), and highlight these keywords.

My plan is this:

  • find the identifier of the message in which there is a search keyword in it;
  • read the contents of this message again and put each word in a fixed buffer array (50 words) until I find the keyword.

Can you help me with some logic, or at least tell me if my logic is ok? I am in the PHP learning phase.

+2
source share
7 answers

If it contains html (note that this is a pretty reliable solution):

$string = '<p>foo<b>bar</b></p>'; $keyword = 'foo'; $dom = new DomDocument(); $dom->loadHtml($string); $xpath = new DomXpath($dom); $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]'); foreach ($elements as $element) { foreach ($element->childNodes as $child) { if (!$child instanceof DomText) continue; $fragment = $dom->createDocumentFragment(); $text = $child->textContent; $stubs = array(); while (($pos = stripos($text, $keyword)) !== false) { $fragment->appendChild(new DomText(substr($text, 0, $pos))); $word = substr($text, $pos, strlen($keyword)); $highlight = $dom->createElement('span'); $highlight->appendChild(new DomText($word)); $highlight->setAttribute('class', 'highlight'); $fragment->appendChild($highlight); $text = substr($text, $pos + strlen($keyword)); } if (!empty($text)) $fragment->appendChild(new DomText($text)); $element->replaceChild($fragment, $child); } } $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild); 

Results in:

 <p><span class="highlight">foo</span><b>bar</b></p> 

And with the help of:

 $string = '<body><p>foobarbaz<b>bar</b></p></body>'; $keyword = 'bar'; 

You get (split into several lines for readability):

 <p>foo <span class="highlight">bar</span> baz <b> <span class="highlight">bar</span> </b> </p> 

Beware of minor decisions (like regex or str_replace ), since highlighting something like β€œdiv” tends to completely destroy your HTML ... This will only β€œhighlight” the lines in the body, never inside the tag ...


Change Since you need Google-style results, here is one way to do this:

 function getKeywordStubs($string, array $keywords, $maxStubSize = 10) { $dom = new DomDocument(); $dom->loadHtml($string); $xpath = new DomXpath($dom); $results = array(); $maxStubHalf = ceil($maxStubSize / 2); foreach ($keywords as $keyword) { $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]'); $replace = '<span class="highlight">'.$keyword.'</span>'; foreach ($elements as $element) { $stub = $element->textContent; $regex = '#^.*?((\w*\W*){'. $maxStubHalf.'})('. preg_quote($keyword, '#'). ')((\w*\W*){'. $maxStubHalf.'}).*?$#ims'; preg_match($regex, $stub, $match); var_dump($regex, $match); $stub = preg_replace($regex, '\\1\\3\\4', $stub); $stub = str_ireplace($keyword, $replace, $stub); $results[] = $stub; } } $results = array_unique($results); return $results; } 

Ok, so this returns an array of matches with $maxStubSize words around it (namely, up to half that number before, and half after) ...

So, given the line:

 <p>a whole <b>bunch of</b> text <a>here for</a> us to foo bar baz replace out from this string <b>bar</b> </p> 

Calling getKeywordStubs($string, array('bar', 'bunch')) will result in:

 array(4) { [0]=> string(75) "here for us to foo <span class="highlight">bar</span> baz replace out from " [3]=> string(34) "<span class="highlight">bar</span>" [4]=> string(62) "a whole <span class="highlight">bunch</span> of text here for " [7]=> string(39) "<span class="highlight">bunch</span> of" } 

So then you can create your result by sorting the strlen list and then selecting the two longest matches ... (suppose php 5.3 +):

 usort($results, function($str1, $str2) { return strlen($str2) - strlen($str1); }); $description = implode('...', array_slice($results, 0, 2)); 

Result:

 here for us to foo <span class="highlight">bar</span> baz replace out...a whole <span class="highlight">bunch</span> of text here for 

I hope this helps ... (I feel it's a little ... bloated ... I'm sure there are better ways to do this, but here is one way) ...

+9
source

Perhaps you could do something like this when connecting to the database:

 $keyword = $_REQUEST["keyword"]; //fetch the keyword from the request $result = mysql_query("SELECT * FROM `posts` WHERE `content` LIKE '%". mysql_real_escape_string($keyword)."%'"); //ask the database for the posttexts while ($row = mysql_fetch_array($result)) {//do the following for each result: $text = $row["content"];//we're only interested in the content at the moment $text=substr ($text, strrpos($text, $keyword)-150, 300); //cut out $text=str_replace($keyword, '<strong>'.$keyword.'</strong>', $text); //highlight echo htmlentities($text); //print it echo "<hr>";//draw a line under it } 
+2
source

If you want to cut the relevant paragraphs, after executing the str_replace function above, you can use stripos () to find the position of these strong sections, and use the offset of this location with substr () to cut the section of the paragraph, for example:

  $ searchterms;

 foreach ($ searchterms as $ search)
 {
 $ paragraph = str_replace ($ search, "<strong> $ search </strong>", $ paragraph);
 }

 $ pos = 0;

 for ($ i = 0; $ i <4; $ i ++)  
 {  
 $ pos = stripos ($ paragraph, "<strong>", $ pos);  
 $ section [$ i] = substr ($ paragraph, $ pos - 100, 200);
 }

which will give you an array of small sentences (200 characters each) to use as you wish. It may also be useful to find the closest place to cut and cut from there to prevent half-words. Oh, and you also need to check for errors, but I will leave it, but for you.

+2
source

You can try to hack a set of database search results into an array using explode , and then use array_search() for each search result. Set the $distance variable in the example below, how many words you want to appear on either side of the first match of $keyword .

In this example, I included the text lorum ipsum as an example database example and set $keyword to 'scelerisque. You obviously replace them in your code.

 //example paragraph text $lorum = 'Nunc nec magna at nibh imperdiet dignissim quis eu velit. vel mattis odio rutrum nec. Etiam sit amet tortor nibh, molestie vestibulum tortor. Integer condimentum magna dictum purus vehicula et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero, tristique et pellentesque sed, mattis eget dui. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. .'; //turn paragraph into array $ipsum = explode(' ',$lorum); //set keyword $keyword = 'scelerisque'; //set excerpt distance $distance = 10; //look for keyword in paragraph array, return array key of first match $match_key = array_search($keyword,$ipsum); if(!empty($match_key)){ foreach($ipsum as $key=>$value){ //if paragraph array key inside excerpt distance if($key > $match_key-$distance and $key< $match_key+$distance){ //if array key matches keyword key, bold the word if($key == $match_key){ $word = '<b>'.$value.'</b>'; } else{ $word = $value; } //create excerpt array to hold words within distance $excerpt[] = $word; } } //turn excerpt array into a string $excerpt = implode(' ',$excerpt); } //print the string echo $excerpt; 

$excerpt returns: "vestibulum tortor. Integer condimentum magna dictum purus vehicleula et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero,

+1
source

Here is a simple text solution:

 $str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'; $keywords = array('co'); $wordspan = 5; $keywordsPattern = implode('|', array_map(function($val) { return preg_quote($val, '/'); }, $keywords)); $matches = preg_split("/($keywordsPattern)/ui", $str, -1, PREG_SPLIT_DELIM_CAPTURE); for ($i = 0, $n = count($matches); $i < $n; ++$i) { if ($i % 2 == 0) { $words = preg_split('/(\s+)/u', $matches[$i], -1, PREG_SPLIT_DELIM_CAPTURE); if (count($words) > ($wordspan+1)*2) { $matches[$i] = '…'; if ($i > 0) { $matches[$i] = implode('', array_slice($words, 0, ($wordspan+1)*2)) . $matches[$i]; } if ($i < $n-1) { $matches[$i] .= implode('', array_slice($words, -($wordspan+1)*2)); } } } else { $matches[$i] = '<b>'.$matches[$i].'</b>'; } } echo implode('', $matches); 

With the current template "/($keywordsPattern)/ui" subwords are matched and highlighted. But you can change this if you want:

  • If you want to combine only whole words and not just subwords, use the \b word boundaries:

     "/\b($keywordsPattern)\b/ui" 
  • If you want to combine subwords, but highlight the whole word, use the front \w in front of and after the keywords:

     "/(\w*?(?:$keywordsPattern)\w*)/ui" 
+1
source

I found this post when I did a search on how to highlight search results by keywords. My requirements:

  • There must be whole words
  • More than one keyword should work
  • Only PHP should be

I retrieve my data from a MySQL that contains no elements by creating a form in which the data is stored.

Here is the code I found most useful:

 $keywords = array("fox","jump","quick"); $string = "The quick brown fox jumps over the lazy dog"; $test = "The quick brown fox jumps over the lazy dog"; // used to compare values at the end. if(isset($keywords)) // For keyword search this will highlight all keywords in the results. { foreach($keywords as $word) { $pattern = "/\b".$word."\b/i"; $string = preg_replace($pattern,"<span class=\"highlight\">".$word."</span>", $string); } } // We must compare the original string to the string altered in the loop to avoid having a string printed with no matches. if($string === $test) { echo "No match"; } else { echo $string; } 

Output:

 The <span class="highlight">quick</span> brown <span class="highlight">fox</span> jumps over the lazy dog. 

I hope this helps someone.

+1
source

If you are a beginner, it will not be as easy as someone might think ...

I think you should take the following steps:

  • create a query based on what the user was looking for (beware of sql injections)
  • get the results and arrange them (the array must be accurate)
  • build html code from previous array

In the third step, you can use some kind of regular expression to replace the search keywords in bold. str_replace may work too ...

Hope this helps ... If you could provide your database structure, maybe I could give you more accurate clues ...

0
source

Source: https://habr.com/ru/post/1397120/


All Articles