Separate a string into words using Swedish characters

I am trying to break a string with text into words using the php function preg_split.

$words = preg_split('/\W/u',$text);

It works great, with the exception of the Swedish lite åäö characters. Running utf8_encode or decoding does not help either. I assume that preg_split only works with single-byte characters and that Swedish characters are multi-byte. Is there any other way to do this?

+3
source share
2 answers

Why do you pay attention to specific characters?

$text = "Jag har hört så mycket om dig.";
$words = explode(" ", $text);
/*
Array
(
    [0] => Jag
    [1] => har
    [2] => hört
    [3] => så
    [4] => mycket
    [5] => om
    [6] => dig.
)
*/
+3
source

mb_split to salvation (I had problems with them a while ago, now they have found the answer :)

mb_regex_encoding('UTF-8');
mb_split('\W', $text);

NTN

+1
source

Source: https://habr.com/ru/post/1728985/


All Articles