I am trying to extract all the words from a string into an array, but I am having problems with spaces ( ).
This is what I do:
$data = strip_tags($data);
$data = htmlentities($data, ENT_QUOTES, 'UTF-8');
$data = html_entity_decode($data, ENT_QUOTES, 'UTF-8');
$data = htmlspecialchars_decode($data);
$data = mb_strtolower($data, 'UTF-8');
$data = str_replace(',', '', $data);
$data = str_replace('.', '', $data);
$data = str_replace(':', '', $data);
$data = str_replace(';', '', $data);
$data = str_replace('*', '', $data);
$data = str_replace('?', '', $data);
$data = str_replace('!', '', $data);
$data = str_replace('-', ' ', $data);
$data = str_replace("\n", ' ', $data);
$data = str_replace("\r", ' ', $data);
$data = str_replace("\t", ' ', $data);
$data = str_replace("\0", ' ', $data);
$data = str_replace("\x0B", ' ', $data);
$data = str_replace(" ", ' ', $data);
do {
$data = str_replace(' ', ' ', $data);
} while(strpos($data, ' ') !== false);
$clean_data = explode(' ', $data);
echo "<pre>";
var_dump($clean_data);
echo "</pre>";
Conclusion:
array(58) {
[0]=>
string(5) " "
[1]=>
string(5) " "
[2]=>
string(11) "anläggning"
[3]=>
string(3) "med"
[4]=>
string(3) "den"
[5]=>
string(10) "erfarenhet"
[6]=>
string(3) "som"
}
If I check the source for output, I see that the first 2 values of the array .
No matter how I try, I cannot remove this from the string. Any ideas?
UPDATE:
After some configuration with the code, I manage to get the following output:
array(56) {
[0]=>
string(1) " "
[1]=>
string(1) " "
[2]=>
string(11) "anläggning"
[3]=>
string(3) "med"
[4]=>
string(3) "den"
[5]=>
string(10) "erfarenhet"
[6]=>
string(3) "som"
[7]=>
string(5) "finns"
[8]=>
string(4) "inom"
Thank!
ANSWER (for lazy people):
Even you, this is a slightly different approach to the problem, and it never answers why I had problems that I had higher (for example, the remaining and other extra strange spaces), I like it and it is much better than my original code.
, !
$data = strip_tags($data);
$data = html_entity_decode($data, ENT_QUOTES, 'UTF-8');
$data = htmlspecialchars_decode($data);
$data = mb_strtolower($data, 'UTF-8');
$data = str_replace(array("-"), ' ', $data);
$clean_data = str_word_count($data, 1, 'äöå');
echo "<pre>";
var_dump($clean_data);
echo "</pre>";