Search for "actual" characters (graphemes) in a QString

Let's say I have a QString that can consist of any Unicode characters, and I want to iterate over its characters or read them. And by "characters" I mean what the user perceives as such (which is roughly equivalent to "glyphs"), and not just QChars (16-bit Unicode characters). Some "actual" characters are built from several QChars (surrogate pairs, base character + combination of labels). For some character combinations, I can get away with normalizing the string to create compound characters, but that doesn't always help.

Am I missing a built-in function that breaks a QString into "actual" characters?

Or, if I have to disassemble it myself, is it a structure (in EBNF) or am I missing something?

character = ((high_surrogate, low_surrogate) | base_character), {combining_mark} 

(with base_character is every QChar that is not a surrogate or combining character)

+4
source share
2 answers

After further research, I found the term "actual symbol", grapheme , and with it the Qt class for finding grapheme borders: QTextBoundaryFinder .

+4
source

I'm not sure about the combination of labels, but for surrogate pairs, I think you can use QString :: toUcs4 () , which should return a 32-bit Unicode representation of your string.

+1
source

Source: https://habr.com/ru/post/1379659/


All Articles