Remove some quotes for str_word_count

I use this function: http://www.seoreviewtools.com/multi-keyword-density-checker-php-script/ , but I had some problems to make it work with weird French words. (see my modified version here: http://pastebin.com/m6PjsizX )

As you know, it str_word_count()does not work with UTF-8 characters, and you can use the third arg to “ignore” these. However, I did not find a way to make it work with quotation marks (very often in French).

There are three cases of simple quotes in French words:

  • One letter_quote_word (for example: j'aime, d habitude, l'avion, s'intégrer)
  • Quote in a word (for example: Aujourd'hui, prud'homme, quelqu'un)
  • Mostly in brand names, quote at the end of a word (e.g. Super ', Vendu')

I want to remove some quotes in order to process str_word_count()without errors (possibly with a regex and preg_replace()) to get this result:

$str = "J'aime la plage d'aujourd'hui, quelqu'un aimerait-il aller chez Super' pour voir l'avion bleue ?");
MagicFunction($str);
$str = str_word_count($str);
echo $str;

aime la plage aujourd'hui, quelqu'un aimerait-il aller chez Super 'pour voir avion bleu

In addition, there are many quotes (', `,', etc.), and I would like this to work with all types of quotes.

Do you have a solution to make it work this way?

Thank!

+4
source share
1 answer

It seems you want

  • remove the apostrophes when they separate two words ( j'aime, l'huile) with the first letter that denotes the abbreviated word
  • , , 2- (, aujourd'hui, quelqu'un, Super').

1 , . , a ' , , .

'~\b\p{L}[\'`‘’]\b~u'

regex

IDEONE:

$re = '~\b\p{L}[\'`‘’]\b~u'; 
$str = "J'aime la plage d'aujourd'hui, quelqu‘un aimerait-il aller chez Super’ pour voir l`avion bleue ? l'école L'"; 
$result = preg_replace($re, "", $str);
echo $result;
// => aime la plage aujourd'hui, quelqu‘un aimerait-il aller chez Super’ pour voir avion bleue ? école L'

, /u preg_replace Unicode.

+2

Source: https://habr.com/ru/post/1628444/


All Articles