Note. . For those dealing with CJK text (Chinese, Japanese, and Korean), double-byte space (Unicode \u3000 ) is not included in \s for any implementation I have tried so far (Perl, .NET, PCRE, Python). First you need to either normalize your lines first (for example, replacing all \u3000 with \u0020 ), or you will have to use a character set that includes this code in addition to any other space you are targeting, for example [ \t\u3000] .
If you use Perl or PCRE, you have the option of using the shorthand string \h for horizontal spaces, which appears to include single-byte space, double-byte space, and a tab, among others. For more information, see Match without spaces, but not newlines (Perl) .
However, this shorthand \h not implemented for .NET and C # as far as I could tell.
Eiríkr Útlendi Apr 19 '16 at 21:17 2016-04-19 21:17
source share