Does preg_replace () change my character set?

I have the following code snippet that seems to change my character set.

$html = "à"; echo $html; // result: à $html = preg_replace("/\s/", "", $html); echo $html; // result: ? 

However, when I use [\t\n\r\f\v] as my template instead of the special character \s , it works fine:

  $html = "à"; echo $html; // result: à $html = preg_replace("/[\t\n\r\f\v]/", "", $html); echo $html; // result: à 

Why is this?

+6
source share
1 answer

I have the same problem. This is because of UTF8.

à is 0xc3a0 in UTF8. In PHP, you can write like this: "\xc3\xa0" .

Using PCRE /s corresponds to 0xa0 , as it was ASCII "Non-breaking space".

You can use u flag to solve the problem.

 $html = preg_replace("/\s/u", "", $html); 
+10
source

Source: https://habr.com/ru/post/956851/


All Articles