What is the best way to check if CharacterSet contains a character in Swift 4?

I am looking for a way, in Swift 4, to check if a Character is a member of an arbitrary character set. I have a Scanner class that will be used for a little parsing. One of the functions in the class is to skip any characters at the current position that belong to a specific set of possible characters.

 class MyScanner { let str: String var idx: String.Index init(_ string: String) { str = string idx = str.startIndex } var remains: String { return String(str[idx..<str.endIndex])} func skip(charactersIn characters: CharacterSet) { while idx < str.endIndex && characters.contains(str[idx])) { idx = source.index(idx, offsetBy: 1) } } } let scanner = MyScanner("fizz buzz fizz") scanner.skip(charactersIn: CharacterSet.alphanumerics) scanner.skip(charactersIn: CharacterSet.whitespaces) print("what remains: \"\(scanner.remains)\"") 

I would like to implement the skip(charactersIn:) function so that the code above prints buzz fizz .

The tricky part of characters.contains(str[idx])) in while - .contains() requires a Unicode.Scalar , and I'm in trouble trying to figure out the next step.

I know that I can pass String the skip function, but I would like to find a way to make it work with CharacterSet because of all the convenient static members ( alphanumerics , whitespaces , etc.).

How to check a CharacterSet if it contains Character ?

+5
source share
2 answers

I know that you wanted to use CharacterSet , not String , but CharacterSet does not support (at least) support for characters consisting of more than one Unicode.Scalar . See the β€œfamily” (πŸ‘©πŸ‘©πŸ‘§πŸ‘¦) symbol or the international flag symbols (for example, β€œπŸ‡―πŸ‡΅β€ or β€œ)”) that Apple demonstrated in the discussion of the line in the WWDC 2017 video What's New in Swift . Multiple skin tones also show this behavior (for example, versus πŸ‘©πŸ½).

As a result, I would be careful to use CharacterSet (which is a "Unicode character set for use in search operations"). Or, if you want to provide this method for convenience, keep in mind that it will not work correctly with characters represented by multiple unicode scalars.

So, you can offer a scanner that provides both CharacterSet and String of skip method output:

 class MyScanner { let string: String var index: String.Index init(_ string: String) { self.string = string index = string.startIndex } var remains: String { return String(string[index...]) } /// Skip characters in a string /// /// This rendition is safe to use with strings that have characters /// represented by more than one unicode scalar. /// /// - Parameter skipString: A string with all of the characters to skip. func skip(charactersIn skipString: String) { while index < string.endIndex, skipString.contains(string[index]) { index = string.index(index, offsetBy: 1) } } /// Skip characters in character set /// /// Note, character sets cannot (yet) include characters that are represented by /// more than one unicode scalar (eg πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦ or πŸ‡―πŸ‡΅ or πŸ‘°πŸ»). If you want to test /// for these multi-unicode characters, you have to use the `String` rendition of /// this method. /// /// This will simply stop scanning if it encounters a multi-unicode character in /// the string being scanned (because it knows the `CharacterSet` can only represent /// single-unicode characters) and you want to avoid false positives (eg, mistaking /// the Jamaican flag, πŸ‡―πŸ‡², for the Japanese flag, πŸ‡―πŸ‡΅). /// /// - Parameter characterSet: The character set to check for membership. func skip(charactersIn characterSet: CharacterSet) { while index < string.endIndex, string[index].unicodeScalars.count == 1, let character = string[index].unicodeScalars.first, characterSet.contains(character) { index = string.index(index, offsetBy: 1) } } } 

So your simple example will work:

 let scanner = MyScanner("fizz buzz fizz") scanner.skip(charactersIn: CharacterSet.alphanumerics) scanner.skip(charactersIn: CharacterSet.whitespaces) print(scanner.remains) // "buzz fizz" 

But use the String command if the characters you want to skip can include multiple unicode scanners:

 let family = "πŸ‘©\u{200D}πŸ‘©\u{200D}πŸ‘§\u{200D}πŸ‘¦" // πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦ let boy = "πŸ‘¦" let charactersToSkip = family + boy let string = boy + family + "foobar" // πŸ‘¦πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦foobar let scanner = MyScanner(string) scanner.skip(charactersIn: charactersToSkip) print(scanner.remains) // foobar 

As Michael Waterfall noted in the comments below, CharacterSet has an error and does not even process 32-bit Unicode.Scalar values ​​correctly, which means that it does not even process individual scalar characters properly if the value exceeds 0xffff (including emoji, among others). However, String execution usually handles them correctly.

+3
source

Not sure if this is the most efficient way, but you can create a new CharSet and check if they are sub / super-sets (Set the comparison pretty quickly)

 let newSet = CharacterSet(charactersIn: "a") // let newSet = CharacterSet(charactersIn: "\(character)") print(newSet.isSubset(of: CharacterSet.decimalDigits)) // false print(newSet.isSubset(of: CharacterSet.alphanumerics)) // true 
+3
source

Source: https://habr.com/ru/post/1271212/


All Articles