Fastest way to get an array of NSRange objects for all uppercase letters in NSString?

I need NSRange objects for the position of each uppercase letter in the given NSString for input into the method for a custom attribute string class.

There are, of course, several ways to do this, for example rangeOfString: options: using NSRegularExpressionSearch or using RegexKitLite, so that every time each match goes on a line.

What will be the fastest approach to accomplish this task?

+4
source share
4 answers

The easiest way is to use -rangeOfCharacterFromSet:options:range: with [NSCharacterSet uppercaseLetterCharacterSet] . By changing the search range for each call, you can easily find all uppercase letters. Something like the following will work to give you an NSArray of all ranges (encoded as NSValues):

 - (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str { NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet]; NSMutableArray *results = [NSMutableArray array]; NSRange searchRange = NSMakeRange(0, [str length]); NSRange range; while ((range = [str rangeOfCharacterFromSet:cs options:0 range:searchRange]).location != NSNotFound) { [results addObject:[NSValue valueWithRange:range]]; searchRange = NSMakeRange(NSMaxRange(range), [str length] - NSMaxRange(range)); } return results; } 

Please note: this will not merge adjacent ranges into a single range, but it is easy enough to add.

Here's an alternative solution based on NSScanner:

 - (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str { NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet]; NSMutableArray *results = [NSMutableArray array]; NSScanner *scanner = [NSScanner scannerWithString:str]; while (![scanner isAtEnd]) { [scanner scanUpToCharactersFromSet:cs intoString:NULL]; // skip non-uppercase characters NSString *temp; NSUInteger location = [scanner scanLocation]; if ([scanner scanCharactersFromSet:cs intoString:&temp]) { // found one (or more) uppercase characters NSRange range = NSMakeRange(location, [temp length]); [results addObject:[NSValue valueWithRange:range]]; } } return results; } 

Unlike the latter, it combines adjacent uppercase characters into a single range.

Edit: if you are looking for absolute speed, this one is likely to be the fastest of the 3 presented here, while maintaining proper Unicode support (note, I have not tried to compile this):

 // returns a pointer to an array of NSRanges, and fills in count with the number of ranges // the buffer is autoreleased - (NSRange *)rangesOfUppercaseLettersInString:(NSString *)string count:(NSUInteger *)count { NSMutableData *data = [NSMutableData data]; NSUInteger numRanges = 0; NSUInteger length = [string length]; unichar *buffer = malloc(sizeof(unichar) * length); [string getCharacters:buffer range:NSMakeRange(0, length)]; NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet]; NSRange range = {NSNotFound, 0}; for (NSUInteger i = 0; i < length; i++) { if ([cs characterIsMember:buffer[i]]) { if (range.location == NSNotFound) { range = (NSRange){i, 0}; } range.length++; } else if (range.location != NSNotFound) { [data appendBytes:&range length:sizeof(range)]; numRanges++; range = (NSRange){NSNotFound, 0}; } } if (range.location != NSNotFound) { [data appendBytes:&range length:sizeof(range)]; numRanges++; } if (count) *count = numRanges; return [data bytes]; } 
+13
source

Using RegexKitLite 4.0+ with a runtime environment that supports Blocks can be quite zippy:

 NSString *string = @"A simple String to TEST for Upper Case Letters."; NSString *regex = @"\\p{Lu}"; [string enumerateStringsMatchedByRegex:regex options:RKLNoOptions inRange:NSMakeRange(0UL, [string length]) error:NULL enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired usingBlock:^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { NSLog(@"Range: %@", NSStringFromRange(capturedRanges[0])); }]; 

The regular expression \p{Lu} says: "Match all characters with the Unicode property" Letter ", which is also the" upper case ".

The RKLRegexEnumerationCapturedStringsNotRequired parameter tells RegexKitLite that it should not create NSString objects and pass them through capturedStrings[] . This saves a lot of time and memory. The only thing passed to the block is the NSRange values ​​to match through capturedRanges[] .

There are two main parts for this: the first is the RegexKitLite method:

 [string enumerateStringsMatchedByRegex:regex options:RKLNoOptions inRange:NSMakeRange(0UL, [string length]) error:NULL enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired usingBlock:/* ... */ ]; 

... and the second is a block that is passed as an argument to this method:

 ^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { /* ... */ } 
+2
source

It depends on the size of the string, but the absolute fastest way I can think of (note: internationalization security is not guaranteed or even expected! Does the concept of capital letters use, for example, Japanese?):

1) Get a pointer to the raw string C of the string, preferably in the stack buffer, if it is small enough. For this CFString has functions. Read the comments in CFString.h.

2) malloc () is a buffer large enough to hold one NSRange per character per line.

3) Something like this (completely untested, written to this text box, pardon and typo errors)

 NSRange *bufferCursor = rangeBuffer; NSRange range = {NSNotFound, 0}; for (int idx = 0; idx < numBytes; ++idx) { if (isupper(buffer[idx])) { if (range.length > 0) { //extend a range, we found more than one uppercase letter in a row range.length++; } else { //begin a range range.location = idx; range.length = 1; } } else if (range.location != NSNotFound) { //end a range, we hit a lowercase letter *bufferCursor = range; bufferCursor++; range.location = NSNotFound; } } 

4) realloc () the range buffer returns to the size that you actually used (perhaps you need to save the number of ranges for this)

+1
source

a function such as isupper * combined with -[NSString characterAtIndex:] will be pretty fast.

* isupper - example - it may or may not be suitable for your input.

0
source

Source: https://habr.com/ru/post/910497/


All Articles