Creating NSCharacterSet Using Unicode SMP Entries and Membership Testing - Is This a Mistake?

Based on the code:

NSString *a    = @"a";
NSString *clef = @"𝄞";
UTF32Char utf32char = 0x1D11E;  // 𝄞

NSCharacterSet *cs1 = [NSCharacterSet characterSetWithCharactersInString:@"𝄞"];
NSCharacterSet *cs2 = [NSCharacterSet characterSetWithCharactersInString:@"𝄞a"];
NSCharacterSet *cs3 = [NSCharacterSet characterSetWithCharactersInString:@"a𝄞"];
NSMutableCharacterSet *mcs1 = [NSMutableCharacterSet characterSetWithCharactersInString:@""];
NSMutableCharacterSet *mcs2 = [NSMutableCharacterSet characterSetWithCharactersInString:@""];

[mcs1 addCharactersInString:clef];
[mcs1 addCharactersInString:a];

[mcs2 addCharactersInString:a];
[mcs2 addCharactersInString:clef];

NSLog(@"cs1 - %@",  [cs1  longCharacterIsMember:utf32char] ? @"YES" : @"NO");
NSLog(@"cs2 - %@",  [cs2  longCharacterIsMember:utf32char] ? @"YES" : @"NO");
NSLog(@"cs3 - %@",  [cs3  longCharacterIsMember:utf32char] ? @"YES" : @"NO");

NSLog(@"mcs1 - %@", [mcs1 longCharacterIsMember:utf32char] ? @"YES" : @"NO");
NSLog(@"mcs2 - %@", [mcs2 longCharacterIsMember:utf32char] ? @"YES" : @"NO");

I get the following output:

cs1 - YES
cs2 - NO
cs3 - NO
mcs1 - YES
mcs2 - NO
  • Why only cs1 works correctly (for immutable character sets)?
  • Why is ordering important for mutable character sets?

This is mistake? Known issue with UTF-16 ObjC internal representation (is this also the case?)?

+4
source share

Source: https://habr.com/ru/post/1536014/


All Articles