How to check this odd space character - "" in Objective-C?

I wrote some RegEx to play with spaces in strings, and it works fine except when I come across this character: β€œinstead of.” You probably think I'm crazy, but apparently they are different. Check out this RegEx app (oddly enough, it often resets it):

When I use weird space:

enter image description here

When I use normal space:

enter image description here

As you can see, there are still many gaps here, but it does not detect strange spaces.

What is this space? How can I get rid of it?

+4
source share
4 answers

Unicode has many different space characters . The space that you posted in your question - both in the header and in the body - is the usual ASCII space, the good old U + 0020.

If you want to verify exactly what you copied to your clipboard, you can run the pbpaste(1) command on Mac OS X. For example, if you copied inextricable space (U + 00A0), you can define it like this:

 # Write pasteboard contents to stdout, convert from UTF-8 to UTF-32 for easy # code point identification, then hex dump the contents $ pbpaste | iconv -f utf-8 -t utf-32be | hexdump -C 00000000 00 00 00 a0 |....| 00000004 

Depending on the regular expression engine you use, it may not support all of them, especially if you use the \s character class. If you want to be sure of the space character, then specify it explicitly in your character class, for example. [\s<YOURSPACEHERE>] , where <YOURSPACEHERE> copied + pasted from the character you want to match.

+2
source

Try "\ p {Z}" for your regular expression. This is a unicode property for any type of space or invisible separator.

See: NSRegularExpression and Unicode Regular Expressions .


As a test of my answer, I built the following unit test.

 - (void)testPattern { NSString *string = @"xxx\u00A0yyy"; NSString *pattern = @"\\p{Z}"; NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:NULL]; NSUInteger number = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])]; STAssertEquals(number, 1U, @""); } 
+1
source

They are probably inextricable spaces, since all lines end with spaces that map to \ s , and not these secret spaces. Try matching \0xA0 .

0
source

You can map Unicode characters to \ x {NNNN}, where NNNN is the hexadecimal character code. See ICU User Guide .

0
source

Source: https://habr.com/ru/post/1494594/


All Articles