How to remove duplicate whitespace characters from UTF8 string in PHP correctly with regular expression?

Question

How to remove duplicate whitespace characters from UTF8 string in PHP correctly with regular expression?

I am trying to remove duplicate space characters from a UTF8 string in PHP using regex. This is a regex

$txt = preg_replace( '/\s+/i' , ' ', $txt );

usually works fine, but some of the lines have the Cyrillic letter "P", which is screwed after replacement. After a little research, I realized that the letter is encoded as \ x {D0A0}, and since \ xA0 is an inextricable space in ASCII, the regular expression replaces it with \ x20, and the character is no longer valid.

Any ideas how to do this correctly in PHP with regex?

+4

php regex utf-8 whitespace

anandr Nov 19 '12 at 8:32

source share

2 answers

Try u modifier:

 $txt="UTF 字符串 with 空格符號"; var_dump(preg_replace("/\\s+/iu","",$txt));

Outputs:

 string(28) "UTF字符串with空格符號"

+5

Passerby Nov 19 '12 at 8:36

source share

asciimoo · Accepted Answer · 2012-11-19T08:37:21+0000

described @ http://www.php.net/manual/en/function.preg-replace.php#106981

If you want to catch characters, as well as European, Russian, Chinese, Japanese, Korean from all, it is simple: - use mb_internal_encoding ('UTF-8'); - use preg_replace (' ... u', '...', $ string) with the u modifier (unicode)

For more information, a complete list of preg_ * modifiers can be found at: http://php.net/manual/en/reference.pcre.pattern.modifiers.php

How to remove duplicate whitespace characters from UTF8 string in PHP correctly with regular expression?

More articles: