How can I dump a string in Rust?

I am writing a simple full-text search library and need flag falsification to check if two words are equal. For this use case, existing .to_lowercase()and not enough ..to_uppercase()

From a quick search on crates.io, I can find libraries for normalizing and splitting words, but not folding. regex-syntaxhas a flag folding code , but it does not appear in its API.

If there are no existing solutions, then I may have to 😃

+4
source share
2 answers

In my use case, I found a caseless box to be most useful.

, , . , , , "㎒" (U + 3392 SQUARE MHZ) "mhz" . , , . Chapter 3 - Default Caseless Matching Unicode.

, :

extern crate caseless;
use caseless::Caseless;

let a = "100 ㎒";
let b = "100 mhz";

// These strings don't match with just case folding,
// but do match after compatibility (NFKD) normalization
assert!(!caseless::default_caseless_match_str(a, b));
assert!(caseless::compatibility_caseless_match_str(a, b));

case , default_case_fold_str:

let s = "Twilight Sparkle ちゃん";
assert_eq!(caseless::default_case_fold_str(s), "twilight sparkle ちゃん");

, , unicode-normalization:

extern crate unicode_normalization;
use caseless::Caseless;
use unicode_normalization::UnicodeNormalization;

fn compatibility_case_fold(s: &str) -> String {
    s.nfd().default_case_fold().nfkd().default_case_fold().nfkd().collect()
}

let a = "100 ㎒";
assert_eq!(compatibility_case_fold(a), "100 mhz");

, .

( BurntSushi5 , .)

+1

unicase , , Eq, Ord Hash . () ASCII ( ), Unicode ( ).

+2

Source: https://habr.com/ru/post/1658876/


All Articles