Search for "actual" characters (graphemes) in a QString

Question

Search for "actual" characters (graphemes) in a QString

Let's say I have a QString that can consist of any Unicode characters, and I want to iterate over its characters or read them. And by "characters" I mean what the user perceives as such (which is roughly equivalent to "glyphs"), and not just QChars (16-bit Unicode characters). Some "actual" characters are built from several QChars (surrogate pairs, base character + combination of labels). For some character combinations, I can get away with normalizing the string to create compound characters, but that doesn't always help.

Am I missing a built-in function that breaks a QString into "actual" characters?

Or, if I have to disassemble it myself, is it a structure (in EBNF) or am I missing something?

character = ((high_surrogate, low_surrogate) | base_character), {combining_mark}

(with base_character is every QChar that is not a surrogate or combining character)

+4

qt unicode utf-16

Sebastian negraszus Nov 04 '11 at 14:58

source share

2 answers

I'm not sure about the combination of labels, but for surrogate pairs, I think you can use QString :: toUcs4 () , which should return a 32-bit Unicode representation of your string.

+1

Steffen Nov 04 '11 at 15:11

source share

Sebastian negraszus · Accepted Answer · 2011-11-04T19:25:30+0000

After further research, I found the term "actual symbol", grapheme , and with it the Qt class for finding grapheme borders: QTextBoundaryFinder .

Search for "actual" characters (graphemes) in a QString

More articles: