Separate full sentences in an NSString text block

I am trying to use regex to separate complete sentences in a large block of text. I cannot use SeparatedByCharactersInSet components because it obviously fails with sentences ending with?!, !!, ... I saw some external classes to do componentSeparateByRegEx, but I prefer to do this without adding an external library.

Here is an example input Hi, I'm testing. How are you? Wow !! this is the best and i am happy.

The output must be an array

first element: Hello, I'm testing.

second element: how are you?

third element: wow !!

The next element: this is the best, and I am happy.

This is what I have, but, as I said, he should not do what I intend. The regular expression will probably be much better here.

-(NSArray *)getArrayOfFullSentencesFromBlockOfText:(NSString *)textBlock{
    NSMutableCharacterSet *characterSet = [[NSMutableCharacterSet alloc] init];
    [characterSet addCharactersInString:@".?!"];  
    NSArray * sentenceArray = [textBlock componentsSeparatedByCharactersInSet:characterSet];                                   
    return sentenceArray;  
}

Thank you for your help,

+2
source share
3 answers

You want to use -[NSString enumerateSubstringsInRange:options:usingBlock:]with the option NSStringEnumerationBySentences. This will give you every sentence, and it does it with an understanding of the language.

NSArray *fullSentencesFromText(NSString *text) {
    NSMutableArray *results = [NSMutableArray array];
    [text enumerateSubstringsInRange:NSMakeRange(0, [text length]) options:NSStringEnumerationBySentences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
        [results addObject:substring];
    }];
    return results;
}

Note that when testing, each substring appears to contain trailing spaces after punctuation. You can delete them.

+13
source

Something like this could accomplish this task:

NSString *msg = @"Hi, I am testing. How are you? Wow!! this is the best, and I am happy.";
[msg enumerateSubstringsInRange:NSMakeRange(0, [msg length])
                        options:NSStringEnumerationBySentences | NSStringEnumerationLocalized
                     usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop)
{
    NSLog(@"Sentence:%@", substring);       
    // Add each sentence into an array                                                                 
}];
+3
source

Or use:

    [mutstri enumerateSubstringsInRange:NSMakeRange(0, [mutstri length])
                                options:NSStringEnumerationBySentences
                             usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){

                                 NSLog(@"%@", substring);

                             }];
0
source

Source: https://habr.com/ru/post/1730320/


All Articles