Objective-C Find Commonly Used Words in NSString

I am trying to write a method:

- (NSDictionary *)wordFrequencyFromString:(NSString *)string {} 

in which the returned dictionary will contain the words and how often they are used in the provided string. Unfortunately, I cannot find a way to iterate over the words in a line to analyze each of them - only for each character, which seems a little more difficult than necessary. Any suggestions?

+6
source share
4 answers

NSString has the -enumerateSubstringsInRange: method, which allows you to list all words directly, allowing the standard api to do everything necessary to determine word boundaries, etc.:

 [s enumerateSubstringsInRange:NSMakeRange(0, [s length]) options:NSStringEnumerationByWords usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) { NSLog(@"%@", substring); }]; 

In the enumeration block, you can use NSDictionary with words in the form of keys and NSNumber as their counters, or use NSCountedSet , which provides the required functions for counting.

+8
source

You can use componentsSeparatedByCharactersInSet: to split the string, and NSCountedSet will read the words for you.

1) Divide the line into words using a combination of punctuation, spaces and newlines:

 NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet]; [separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; NSArray *words = [myString componentsSeparatedByCharactersInSet:separators]; 

2) Count the occurrences of words (if you want to ignore capital letters, you can do NSString *myString = [originalString lowercaseString]; before dividing the string into components):

 NSCountedSet *frequencies = [NSCountedSet setWithArray:words]; NSUInteger aWordCount = [frequencies countForObject:@"word"]); 

If you want to change your method signature, you can simply return the counted set.

+3
source

First, split the string into an array of words using -[NSString componentsSeparatedByCharactersInSet:] . (Use [[NSCharacterSet letterCharacterSet] invertedSet] as an argument to split on all non-letter characters.)

+2
source

I used the following approach to get the most common word from NSString.

 -(void)countMostFrequentWordInSpeech:(NSString*)speechString { NSString *string = speechString; NSCountedSet *countedSet = [NSCountedSet new]; [string enumerateSubstringsInRange:NSMakeRange(0, [string length]) options:NSStringEnumerationByWords | NSStringEnumerationLocalized usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){ [countedSet addObject:substring]; }]; // NSLog(@"%@", countedSet); //Sort CountedSet & get most frequent common word at 0th index of resultant array NSMutableArray *dictArray = [NSMutableArray array]; [countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) { [dictArray addObject:@{@"object": obj, @"count": @([countedSet countForObject:obj])}]; }]; NSArray *sortedArrayOfWord= [dictArray sortedArrayUsingDescriptors:@[[NSSortDescriptor sortDescriptorWithKey:@"count" ascending:NO]]]; if (sortedArrayOfWord.count>0) { self.mostFrequentWordLabel.text=[NSString stringWithFormat:@"Frequent Word: %@", [[sortedArrayOfWord[0] valueForKey:@"object"] capitalizedString]]; } } 

"speechString" is my line from which I should get the most common / common word. The object at the 0th index of the sortedArrayOfWord array would be the most common word.

0
source

Source: https://habr.com/ru/post/897103/


All Articles