Basic revocation filter in Objective-C for iPhone

As you like like-minded people dealing with the main problem of filtering profanity, it is obvious that it is impossible to solve any scenario, but it would be nice to have one at the most basic level as the first line of defense.

In Obj-c, I have

NSString *tokens = [text componentsSeparatedByString:@" "];

And then I scroll through each token to see if each token has any of the keywords (I have about 400 in the list).

The implementation of false positives is also a problem if the word is a perfect match, it is marked as profanity, otherwise if more than three words with profanity are found without perfect matches, it is also marked as profanity.

Later I will use a web service that more accurately solves the problem, but I really need something basic. So if you wrote the word "penis", it would mean that it would be mischievous, incomprehensible, written in a bad word.

+3
source share
4 answers

I have a suggestion for string tokenization. Your methods work well if all words are separated by lines, but this is rarely the case in most use cases, since you usually have to deal with newline characters, punctuation, etc. Try this if you are interested:

NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];

[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

NSArray *words = [bigString componentsSeparatedByCharactersInSet:separators];

: http://www.tech-recipes.com/rx/3418/cocoa-explode-break-nsstring-into-individual-words/

+3

, , , ... FSA. , , , . , , .

, , 400 . , , ? , ? ? , , .

+2

:

  • FSA , ,
  • Regex , ,
  • 400 , .
  • , , "ASSume" .

, Inversoft, , . FSA, , (4000 ). 600 , , , , , ..

- , Clean Speak Inversoft. Obj-C XML WebService.

0

Source: https://habr.com/ru/post/1745036/


All Articles