To combine diacritics, they are considered letters? Because, as far as I know, they can only be combined with other letters in a well-formed Unicode.
The ICU's function for determining whether a Unicode code word is a letter takes only one code, so for any given code point it cannot know if they were combined with diacritics, or if it is diacritical, which was combined with. I am trying to implement something like a regex that supports Unicode using a type construct
while(is_letter(codepoint))
However, I am very concerned about what happens if the codepoint
is actually diacritic, which will be matched with the previous code number and other matching labels.
Is it safe to do this? Or will I have to explicitly find and ignore diacritics and other marks?
Edit: I really need to do iterations of characters, not code pages.
This question is a victim of the XY problem. I need to ask a question about my real problem.
Puppy source share