PHP - Matching keywords in text strings - How to improve the accuracy of returned keywords?

I have a PHP code snippet as follows:

$words = array(
    'Art' => '1',
    'Sport' => '2',
    'Big Animals' => '3',
    'World Cup' => '4',
    'David Fincher' => '5',
    'Torrentino' => '6',
    'Shakes' => '7',
    'William Shakespeare' => '8'
    );
$text = "I like artists, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = $all_keys = array();
foreach ($words as $word => $key) {
    if (strpos(strtolower($text), strtolower($word)) !== false) {
        $all_keywords[] = $word;
        $all_keys[] = $key;
    }
}
        echo $keywords_list = implode(',', $all_keywords) ."<br>";
        echo $keys_list = implode(',', $all_keys) . "<br>";

Echos code Art,Sport,World Cup,Shakes,William Shakespeareand 1,2,4,7,8; however, the code is very simple and not accurate enough to repeat the correct keywords. For example, the code returns 'Shakes' => '7'because of the word Shakespearein $text, but, as you can see, Shakes cannot represent Shakespeare as the correct keyword. Mostly I want to return Art,Sport,World Cup,William Shakespeareand 1,2,4,8instead of Art,Sport,World Cup,Shakes,William Shakespeareand 1,2,4,7,8. So, could you help me develop the best code to extract keywords without problems with similar problems? thank you for your help.

+4
source share
4

, :

// create regular expression by using alternation
// of all given words
$re = '/\b(?:' . join('|', array_map(function($keyword) {
    return preg_quote($keyword, '/');
}, array_keys($words))) . ')\b/i';

preg_match_all($re, $text, $matches);
foreach ($matches[0] as $keyword) {
    echo $keyword, " ", $words[$keyword], "\n";
}

\b , .. .

World Cup 4
William Shakespeare 8
+4

, . , strpos(), , .
, , , .

, - .

script, demo.php chmod + x demo.php && & &./demo.php


`   #!/USR//PHP   

//array of regular expressions to match your words/phrases
$words = array(
    '/\b[Aa]rt\b/',
    '/\bI\b/',
    '/\bSport\b/',
    '/\bBig Animals\b/' ,
    '/\bWorld Cup\b/' ,
    '/\bDavid Fincher\b/',
    '/\bTorrentino\b/' ,
    '/\bShakes\b/' ,
    '/\b[sS]port[s]{0,1}\b/' ,
    '/\bWilliam Shakespeare\b/',
);

$text = "I like artists and art, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";

$all_keywords = array();  //changed formatting for clarity
$all_keys     = array();
foreach ($words as $regex) {
  $m = array();
  if (preg_match_all($regex, $text, $m, PREG_OFFSET_CAPTURE)>=1)
    for ($n=0;$n<count($m); ++$n) { 
      $match = $m[0];
      foreach($match as $mm) {         
        $key = $mm[1];          //key is the offset in $text where the match begins
        $word = $mm[0];         //the matched word/phrase
        $all_keywords[] = $word;
        $all_keys[] = $key;
      }
    }
}

echo "\$text = \"$text\"\n";
echo $keywords_list = implode(',', $all_keywords) ."<br>\n";
echo $keys_list = implode(',', $all_keys) . "<br>\n";

`

+2

strpos(strtolower($text), strtolower($word)

preg_match('/\b'.$word.'\b/',$text)

, , , :

preg_match('/\b'.strtolower($word).'\b/', strtolower($text))

strtolower($text), , foreach.

0

, , , .

  • If we somehow sort the array of $ strlen words (top-down, larger words above and less below), there will be more chances for the desired “match”.
  • In a for loop, when the word "matches" or strcmp returns true, we can remove the matching word from the string to avoid unnecessary matches. (For example, Shakes will always match where William Shakespeare matches.)

PS SO ios app rock! But still not easy to code (damn auto-correct!)

0
source

Source: https://habr.com/ru/post/1547223/


All Articles