Why does Rust borrow a checker by rejecting this code?

Question

Why does Rust borrow a checker by rejecting this code?

I get a Rust compilation error from checking checks, and I don't understand why. Perhaps something about life I do not quite understand.

I threw it to a short code. Basically, I want to do this:

fn main() { let codeToScan = "40 + 2"; let mut scanner = Scanner::new(codeToScan); let first_token = scanner.consume_till(|c| { ! c.is_digit ()}); println!("first token is: {}", first_token); // scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL? }

Calling scanner.consume_till second time gives me this error:

 example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time example.rs:64 scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL? ^~~~~~~ example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends example.rs:62 let first_token = scanner.consume_till(|c| { ! c.is_digit ()}); ^~~~~~~ example.rs:65:2: 65:2 note: previous borrow ends here example.rs:59 fn main() { ... example.rs:65 }

Basically, I did something like my own iterator, and the equivalent of the “next” method takes &mut self . Because of this, I cannot use this method more than once in the same area.

However, the std Rust library has an iterator that can be used more than once in the same scope, and also has the &mut self parameter.

 let test = "this is a string"; let mut iterator = test.chars(); iterator.next(); iterator.next(); // This is PERFECTLY LEGAL

So why does the stust Rust library compiler compile, but mine is not? (I’m sure that annotations for life are at the root of this, but my understanding of life does not lead to the fact that I expect problems).

Here is my full code (total 60 lines shortened for this question):

  use std::str::{Chars}; use std::iter::{Enumerate}; #[deriving(Show)] struct ConsumeResult<'lt> { value: &'lt str, startIndex: uint, endIndex: uint, } struct Scanner<'lt> { code: &'lt str, char_iterator: Enumerate<Chars<'lt>>, isEof: bool, } impl<'lt> Scanner<'lt> { fn new<'lt>(code: &'lt str) -> Scanner<'lt> { Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false} } fn assert_not_eof<'lt>(&'lt self) { if self.isEof {fail!("Scanner is at EOF."); } } fn next(&mut self) -> Option<(uint, char)> { self.assert_not_eof(); let result = self.char_iterator.next(); if result == None { self.isEof = true; } return result; } fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> { self.assert_not_eof(); let mut startIndex: Option<uint> = None; let mut endIndex: Option<uint> = None; loop { let should_quit = match self.next() { None => { endIndex = Some(endIndex.unwrap() + 1); true }, Some((i, ch)) => { if startIndex == None { startIndex = Some(i);} endIndex = Some(i); quit (ch) } }; if should_quit { return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()), startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() }; } } } } fn main() { let codeToScan = "40 + 2"; let mut scanner = Scanner::new(codeToScan); let first_token = scanner.consume_till(|c| { ! c.is_digit ()}); println!("first token is: {}", first_token); // scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL? }

+6

rust

Charlie flowers Jul 20 '14 at 4:58

source share

2 answers

Let's take a look at consume_till .

It takes &'lt mut self and returns ConsumeResult<'lt> . This means that the lifetime 'lt , the borrowing time of the input parameter self will be equal to the duration of the output parameter, the return value.

Expressed in a different way, after calling consume_till , you cannot use self again until its result depends.

This result is placed in first_token , and first_token is still in your last row area.

To get around this, you must force first_token to go out of scope; adding a new block around it will do the following:

 fn main() { let code_to_scan = "40 + 2"; let mut scanner = Scanner::new(code_to_scan); { let first_token = scanner.consume_till(|c| !c.is_digit()); println!("first token is: {}", first_token); } scanner.consume_till(|c| c.is_whitespace()); }

All this makes sense: while you are referring to something inside the Scanner , it is unsafe to allow you to modify it so that this link is not invalidated. This is the memory security that Rust provides.

+3

Chris morgan Jul 20 '14 at 7:27

source share

huon · Accepted Answer · 2014-07-21T09:18:49+0000

Here's a simpler example of the same:

 struct Scanner<'a> { s: &'a str } impl<'a> Scanner<'a> { fn step_by_3_bytes<'a>(&'a mut self) -> &'a str { let return_value = self.s.slice_to(3); self.s = self.s.slice_from(3); return_value } } fn main() { let mut scan = Scanner { s: "123456" }; let a = scan.step_by_3_bytes(); println!("{}", a); let b = scan.step_by_3_bytes(); println!("{}", b); }

If you compile this , you get errors, such as the code in the question:

 <anon>:19:13: 19:17 error: cannot borrow `scan` as mutable more than once at a time <anon>:19 let b = scan.step_by_3_bytes(); ^~~~ <anon>:16:13: 16:17 note: previous borrow of `scan` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scan` until the borrow ends <anon>:16 let a = scan.step_by_3_bytes(); ^~~~ <anon>:21:2: 21:2 note: previous borrow ends here <anon>:13 fn main() { ... <anon>:21 } ^

So, the first thing to do is to avoid shadowing the lifetime, that is, this code has two lifetimes called 'a , and all 'a in step_by_3_bytes refer to 'a declare there, none of them actually refer to 'a in Scanner<'a> . I will rename the inner one so that it is clear what is happening.

 impl<'a> Scanner<'a> { fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {

The problem is that 'b connects the self object with the return value of str . The compiler should assume that calling step_by_3_bytes can make arbitrary changes, including invalid previous return values, when viewing the definition of step_by_3_bytes from the outside (how the compiler works, type checking is based solely on the signature types of things that are called, without introspection). That is, it can be defined as

 struct Scanner<'a> { s: &'a str, other: String, count: uint } impl<'a> Scanner<'a> { fn step_by_3_bytes<'b>(&'b mut self) -> &'b str { self.other.push_str(self.s); // return a reference into data we own self.other.as_slice() } }

Now every call to step_by_3_bytes begins to change the object from which the previous return values were obtained. For instance. this can lead to a reallocation of String and thus a memory move, leaving any other &str return values as dangling pointers. Rust protects against this by keeping track of these links and prohibiting mutation if this could lead to such catastrophic events. Returning to our actual code: the compiler checks the type main , just looking for a signature like step_by_3_bytes / consume_till , and therefore it can only accept the worst case scenario (i.e. the example I just gave).

How to solve this?

Let's take a step back: as if we were just starting out and did not know what lifetimes we want for the returned values, so we just leave them anonymous (not really valid Rust):

 impl<'a> Scanner<'a> { fn step_by_3_bytes<'b>(&'_ mut self) -> &'_ str {

Now we can ask a funny question: what kind of lives do we want, where?

It is almost always better to comment on long valid lifetimes, and we know that our return value lives for 'a (since it refers to s and that &str true for 'a ). I.e

 impl<'a> Scanner<'a> { fn step_by_3_bytes<'b>(&'_ mut self) -> &'a str {

For another '_ we care: as designers of the API, we don’t have any particular desire or need to associate self with any other links (as opposed to the return value, where we wanted / had to express what memory it came from) . So we could also leave it

 impl<'a> Scanner<'a> { fn step_by_3_bytes<'b>(&mut self) -> &'a str {

'b not used, so it can be killed by leaving

 impl<'a> Scanner<'a> { fn step_by_3_bytes(&mut self) -> &'a str {

This expresses that the Scanner refers to some memory that is valid for at least 'a , and then returns references to that memory. The self object is essentially just a proxy for managing these views: as soon as you return to the link, you can drop the Scanner (or call more methods).

So the full working code

 struct Scanner<'a> { s: &'a str } impl<'a> Scanner<'a> { fn step_by_3_bytes(&mut self) -> &'a str { let return_value = self.s.slice_to(3); self.s = self.s.slice_from(3); return_value } } fn main() { let mut scan = Scanner { s: "123456" }; let a = scan.step_by_3_bytes(); println!("{}", a); let b = scan.step_by_3_bytes(); println!("{}", b); }

When applying this change to your code, just configure the consume_till definition.

 fn consume_till(&mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {

So why does the stust Rust library compiler compile, but mine is not? (I’m sure that annotations for life are at the root of this, but my understanding of life does not lead to the fact that I expect problems).

There is a small (but not huge) difference: Chars just returns a char , i.e. there are no lifetimes in the returned value. The next method (essentially) has a signature:

 impl<'a> Chars<'a> { fn next(&mut self) -> Option<char> {

(This is actually a sign of Iterator impl , but it doesn’t matter.)

The situation you have here is like a record

 impl<'a> Chars<'a> { fn next(&'a mut self) -> Option<char> {

(Likewise, in terms of "incorrect binding of lifetimes" the details are different.)

Why does Rust borrow a checker by rejecting this code?

How to solve this?

More articles: