I have a PHP code snippet as follows:
$Keywords = array(
', JOE.' => '1',
', JOE' => '2',
'JOE' => '3',
'JOE.' => '4',
'/JOE' => '5',
'/JOE/' => '6',
'JOE/.' => '7',
',JOE.' => '8'
);
$Text = "JOE is JOE is JOE is JOE is JOE is JOE is JOE. Hello , JOE. Hey ,JOE. Come on , JOE. Dude,JOE/. Shut up ,JOE. What is the meaning of /JOE/? Of course, JOE";
extract_keyword ($Keywords, $Text);
function extract_keyword ($Keywords, $Text){
mb_internal_encoding('UTF-8');
uksort($Keywords, function ($a, $b) {
$as = mb_strlen($a);
$bs = mb_strlen($b);
if ($as > $bs) {
return -1;
}
else if ($bs > $as) {
return 1;
}
return 0;
});
$Keywords_ci = array();
foreach ($Keywords as $k => $v) {
$Keywords_ci[$k] = $v;
}
$re = '/\b(?:' . join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, array_keys($Keywords))) . ')\b/i';
$KeywordArrayKey = array();
$KeywordArrayValue = array();
$NewArray = array();
preg_match_all($re, $Text, $matches);
foreach ($matches[0] as $keyword) {
$KeywordArrayKey[] = $keyword;
$KeywordArrayValue[] = $Keywords_ci[$keyword];
if(!empty($keyword) && !empty($Keywords_ci[$keyword])) {
$NewArray[] = array($keyword => $Keywords_ci[$keyword]);
}
}
print_r($NewArray) ."<br><br>";
}
Echos code below:
Array (
[0] => Array ( [JOE] => 3 )
[1] => Array ( [JOE] => 3 )
[2] => Array ( [JOE] => 3 )
[3] => Array ( [JOE] => 3 )
[4] => Array ( [JOE] => 3 )
[5] => Array ( [JOE] => 3 )
[6] => Array ( [JOE] => 3 )
[7] => Array ( [JOE] => 3 )
[8] => Array ( [JOE] => 3 )
[9] => Array ( [JOE] => 3 )
[10] => Array ( [JOE] => 3 )
[11] => Array ( [JOE] => 3 )
[12] => Array ( [JOE] => 3 )
[13] => Array ( [, JOE] => 2 ) )
As you can see, the problem is that the code is not accurate enough to extract $keywordswhere there are keywords such as ', JOE.' => '1' or 'JOE/.' => '7'. In fact, my goal is to precisely separate '/JOE' => '5'from '/JOE/' => '6'or 'JOE.' => '4'etc. Could you take a look at the code and tell me how to improve the quality / accuracy of the extracted keywords? Thank you for your help.
Note 1: print_r($Keywords_ci);prints Array ( [, JOE.] => 1 [JOE/.] => 7 [,JOE.] => 8 [, JOE] => 2 [/JOE/] => 6 [JOE.] => 4 [/JOE] => 5 [JOE] => 3 ), but I'm looking for an echo of all instances of available keywords, such as '/JOE/' => '6'or ',JOE.' => '8'in $Text.
Note 2: Below is the expected print from print_r($NewArray):
Array (
[0] => Array ( [JOE] => 3 )
[1] => Array ( [JOE] => 3 )
[2] => Array ( [JOE] => 3 )
[3] => Array ( [JOE] => 3 )
[4] => Array ( [JOE] => 3 )
[5] => Array ( [JOE] => 3 )
[6] => Array ( [JOE.] => 4 )
[7] => Array ( [, JOE.] => 1 )
[8] => Array ( [,JOE.] => 8 )
[9] => Array ( [, JOE.] => 1 )
[10] => Array ( [JOE/.] => 7 )
[11] => Array ( [,JOE.] => 8 )
[12] => Array ( [/JOE/] => 6 )
[13] => Array ( [, JOE] => 2 ) )