My goal: an arbitrary UTF-16 position is given in String
, find the corresponding String.Index
one that represents Character
(i.e. an expanded grapheme cluster) the specified UTF-16 block of code is part.
Example:
(I put the code in Gist for easy copying and pasting.)
This is my test line:
let str = "๐จ๐พโ๐"
(Note: to see a string as a single character, you need to read it in a fairly recent OS / browser combination that can handle the new emoji profession with skin tones introduced in Unicode 9.)
This is one cluster Character
(grapheme), which consists of four Unicode scanners or 7 UTF-16:
print(str.unicodeScalars.map { "0x\(String($0.value, radix: 16))" })
// โ ["0x1f468", "0x1f3fe", "0x200d", "0x1f692"]
print(str.utf16.map { "0x\(String($0, radix: 16))" })
// โ ["0xd83d", "0xdc68", "0xd83c", "0xdffe", "0x200d", "0xd83d", "0xde92"]
print(str.utf16.count)
// โ 7
UTF-16 (, 2), String.Index
:
let utf16Offset = 2
let utf16Index = String.Index(encodedOffset: utf16Offset)
, Character
, Character
, , :
let char = str[utf16Index]
print(char)
// โ ๐พโ๐
print(char.unicodeScalars.map { "0x\(String($0.value, radix: 16))" })
// โ ["0x1f3fe", "0x200d", "0x1f692"]
( , ):
let trappingIndex = String.Index(encodedOffset: 1)
str[trappingIndex]
, Character
:
extension String.Index {
func isOnCharacterBoundary(in str: String) -> Bool {
return String.Index(self, within: str) != nil
}
}
trappingIndex.isOnCharacterBoundary(in: str)
utf16Index.isOnCharacterBoundary(in: str)
:
, , true
. String.Index.init(_:within:)
:
, sourcePosition
, grapheme - - .
utf16Index
grapheme - 0, 2. .
encodedOffset
isOnCharacterBoundary
.
- ? , Character
? Swift?
: Swift 4.0/Xcode 9.0 macOS 10.13.
: Twitter .
: String.Index.init?(_:within:)
Swift 4.0 : SR-5992.