Finding grapheme cluster size in Rust

I am working on a scanner (or tokenizer or lexer if you want). I have to iterate through a string slice. I found two methods for this:

First , I can create an iterator and iterate over each character. Here is a simplified example:

let s = "llo".chars();
for c in s {
    println!("{}", c);
}

However, if I want to look to the future, it is a little less simple:

let mut s = "llo = ==".chars().peekable();
loop {
    match (s.next(), s.peek()) {
        (Some('='), Some(&'=')) => { s.next(); println!("==") },
        (Some('='), _        )  => println!("="),
        (Some(c)  , _        )  => println!("{}", c),
        (None, _) => break,
    }
}

Unfortunately, it does not seem that I can do a few looking ahead if I want, for example, to look at the next next character.

Therefore, I can work with the second method instead . I can convert a slice of a string to a character vector. For instance:

fn char_at(text: &Vec<char>, pos: usize) -> Option<char> {
    if pos < text.len() {
        Some(text[pos])
    } else {
        None
    }
}

let mut text = "llo = ==".chars().collect();
let mut position: usize = 0;
loop {
    match (char_at(&text, position), char_at(&text, position + 1)) {
        (Some('='), Some('=')) => { position += 1; println!("==") },
        (Some('='), _        ) => println!("="),
        (Some(c)  , _        ) => println!("{}", c),
        (None     , _        ) => break,
    }
    position += 1;
}

, , - , . , , char_at, Rust, grapheme.

, , my vector of characters, . - ( ):

let mut text = "llo = ==";
let mut position: usize = 0;
loop {
    let next_char = text.char_at(position);
    let peek_char = text.char_at(position + next_char.len());
    match (next_char, peek_char)) {
        (Some('='), Some('=')) => {
            position += peek_char.len();
            println!("==")
        },
        (Some('='), _        ) => println!("="),
        (Some(c)  , _        ) => println!("{}", c),
        (None     , _        ) => break,
    }
    position += next_char.len();
}

. , next_char.len() peek_char.len(), , .

:

  • .
  • ( O (n) ). , .
  • , , , , .

Rust. , :

  • ?
  • ?
  • , ?
+4
1

str::chars(), str::Chars, Clone. , , , , .clone() . lookaheads , .

str::Chars iter::Slice<u8>, . , a str::Chars , . str::Chars , !

fn main() {
    let mut s = "llo = ==".chars();
    loop {
        let mut s2 = s.clone();
        let c1 = s2.next();
        let c2 = s2.next();
        match (c1, c2) {
            (Some('='), Some('=')) => { s.next(); println!("=="); }
            (Some('='), _        ) => { s.next(); println!("="); }
            (Some(c)  , _        ) => { s.next(); println!("{}", c); }
            (None, _) => break,
        }
    }
}
+3

Source: https://habr.com/ru/post/1670775/


All Articles