Invert Unicode String Encoding Strings

Question

Invert Unicode String Encoding Strings

I have an index in which text strings are stored for search, both in the original form and in their associated form (the Collated form is used to search for the index, the original is displayed in the results).

The mapping is done using the ICU4C implementation, which works as defined in the Unicode Collation Algorithm . I use the sort keys and usually retain the main strength (no accents, lower / upper case, code pages, etc.).

For debugging purposes, is there a way to invert sorting sorting to get a human-readable string similar to the original? This is obviously a lossy process, but converting the sort key 'a' to display the ASCII character 'a' is good enough. Hopefully there is a standard way to do this, without having to translate from a binary sort key to printable Unicode characters. The optimal solution will be implemented in C / C ++.

Thanks in advance.

+4

unicode collation icu uca

scooz May 12, '14 at 12:46

source share

1 answer

AndreyS Scherbakov · Answer 1 · 2017-01-31T13:38:17+0000

You do not need a universal reverse sorting algorithm. You just want to find the sort keys for the strings you've ever handled.

, (), UTF, . UTF , :

allStrings[collation_key] = utf_string

, . , .

, , , , , . p >

, , , : object → str, : object → coll : ( ) your_dictionary [collations [object]] = [object].

, , , - , - .. , .

char, "given_character long_constant_sequence" . , , change you given_character (-) _character.
. : → (, , primary_elements - > primary_elements_size; (, , , - ))
. , , - .
, primary_elements → . UTF .

Invert Unicode String Encoding Strings

More articles: