Fighting closure and rust life

I am trying to port a small test from F # to Rust. The F # code is as follows:

let inline iterNeighbors f (i, j) = f (i-1, j) f (i+1, j) f (i, j-1) f (i, j+1) let rec nthLoop n (s1: HashSet<_>) (s2: HashSet<_>) = match n with | 0 -> s1 | n -> let s0 = HashSet(HashIdentity.Structural) let add p = if not(s1.Contains p || s2.Contains p) then ignore(s0.Add p) Seq.iter (fun p -> iterNeighbors add p) s1 nthLoop (n-1) s0 s1 let nth np = nthLoop n (HashSet([p], HashIdentity.Structural)) (HashSet(HashIdentity.Structural)) (nth 2000 (0, 0)).Count 

It computes the nth closest neighboring shells from the starting vertex in a potentially infinite graph. I used something similar during my PhD to study amorphous materials.

I spent many hours trying not to port this to Rust. I managed to get one version, but only by manually setting the closure and converting the recursion into a loop with local variables (yuk!).

 use std::collections::HashSet; 

I tried to write the iterNeighbors function as follows:

 fn iterNeighbors<F>(f: &F, (i, j): (i32, i32)) -> () where F : Fn((i32, i32)) -> () { f((i-1, j)); f((i+1, j)); f((i, j-1)); f((i, j+1)); } 

I think this is a function that accepts a closure (which itself takes a pair and returns one), and the pair returns one. It seems I need to double the brackets: is this correct?

I tried to write a recursive version as follows:

 fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> { if n==0 { return &s1; } else { let mut s0 = HashSet::new(); for &p in s1 { if !(s1.contains(&p) || s2.contains(&p)) { s0.insert(p); } } return &nthLoop(n-1, s0, s1); } } 

Please note that I have not worried about calling iterNeighbors .

I think I'm trying my best to get the arguments back into life correctly, because they rotate in a recursive call. How should I annotate the lifetime if I want s2 be released just before return , and I want s1 withstand either on return or in a recursive call?

The caller will look something like this:

 fn nth<'a>(n: i32, p: (i32, i32)) -> &'a HashSet<(i32, i32)> { let s0 = HashSet::new(); let mut s1 = HashSet::new(); s1.insert(p); return &nthLoop(n, &s1, s0); } 

I discarded this and wrote it as a while with mutable locales:

 fn nth<'a>(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> { let mut n = n; let mut s0 = HashSet::new(); let mut s1 = HashSet::new(); let mut s2 = HashSet::new(); s1.insert(p); while n > 0 { for &p in &s1 { let add = &|p| { if !(s1.contains(&p) || s2.contains(&p)) { s0.insert(p); } }; iterNeighbors(&add, p); } std::mem::swap(&mut s0, &mut s1); std::mem::swap(&mut s0, &mut s2); s0.clear(); n -= 1; } return s1; } 

This works if I set the closure manually, but I cannot figure out how to cause the closure. Ideally, I would like to set up a static dispatch.

main function:

 fn main() { let s = nth(2000, (0, 0)); println!("{}", s.len()); } 

So ... what am I doing wrong ?:-)

Also, I used only HashSet in F # because I assume that Rust does not provide a purely functional Set efficient set-theoretic operations (union, intersection, and difference). Am I right in assuming that?

+5
source share
2 answers

I think this is a function that accepts a closure (which itself takes a pair and returns one), and the pair returns one. It seems I need to double the brackets: is this correct?

You need double brackets because you pass the 2-tuple to a closure that matches your F # source code.

I think I'm trying my best to get the arguments back into life correctly, because they rotate in a recursive call. How should I annotate the lifetime if I want s2 to be freed immediately before returning, and I want s1 to survive either on return or in a recursive call?

The problem is that you use HashSet links when you just need to use the HashSet directly. Your signature for nthLoop already correct; you just need to remove multiple occurrences & .

To free s2 , you can write drop(s2) . Note that Rust does not have guaranteed tail calls, so each recursive call will still take up some space on the stack (you can see how much with the mem::size_of ), but the drop call will clear the data on the heap.

The caller will look something like this:

Again, you just need to delete & here.

Notice that I didn’t even bother about calling iterNeighbors.


This works if I set the closure manually, but I cannot figure out how to cause the closure. Ideally, I would like to set up a static dispatch.

Rust has three types of closures: Fn , FnMut and FnOnce . They differ in the type of the self argument. The distinction is important because it sets limits on what closure is allowed to do, and on how the caller can use the closure. Rust has a chapter on closure that already explains this well.

Your close should mutate s0 . However, iterNeighbors defined as pending Fn closure. Your closure cannot implement Fn because Fn gets &self , but for mutation s0 you need &mut self . iterNeighbors cannot use FnOnce since it must cause closure more than once. Therefore you need to use FnMut .

In addition, there is no need to skip the closure with the iterNeighbors reference. You can simply pass it by value; each closure call will only take closure, not consume it.

In addition, I only used HashSet in F # because I assume that Rust does not provide a purely functional set with efficient set-theoretic operations (union, intersection, and difference). Am I right in assuming that?

There is no implementation of a purely functional set in the standard library (perhaps there is crates.io ?). While Rust uses functional programming, it also takes advantage of its ownership and borrowing system to make programming more secure. A functional set would probably impose the use of some form of reference counting or garbage collection to exchange elements between sets.

However, a HashSet performs set-theoretic operations. There are two ways to use them: iterators ( difference , symmetric_difference , intersection , union ), which generate the sequence lazily or operators ( | , & , ^ , - , as indicated in the implementations for HashSet ), which create new sets containing clones of values ​​from sets sources.


Here's the working code:

 use std::collections::HashSet; fn iterNeighbors<F>(mut f: F, (i, j): (i32, i32)) -> () where F: FnMut((i32, i32)) -> () { f((i-1, j)); f((i+1, j)); f((i, j-1)); f((i, j+1)); } fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> { if n == 0 { return s1; } else { let mut s0 = HashSet::new(); for &p in &s1 { let add = |p| { if !(s1.contains(&p) || s2.contains(&p)) { s0.insert(p); } }; iterNeighbors(add, p); } drop(s2); return nthLoop(n-1, s0, s1); } } fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> { let mut s1 = HashSet::new(); s1.insert(p); let s2 = HashSet::new(); return nthLoop(n, s1, s2); } fn main() { let s = nth(2000, (0, 0)); println!("{}", s.len()); } 
+8
source

It seems I need to double the brackets: is this correct?

No: double brackets are that you chose to use tuples and calling a function that takes a tuple requires you to create a tuple first, but you can have locks that take several arguments, for example F: Fn(i32, i32) . That is, this function can be written like this:

 fn iterNeighbors<F>(i: i32, j: i32, f: F) where F: Fn(i32, i32) { f(i-1, j); f(i+1, j); f(i, j-1); f(i, j+1); } 

However, it seems that saving this set makes sense for this case.

I think I'm trying my best to get the arguments back into life correctly, because they rotate in a recursive call. How should I annotate the lifetime if I want s2 to be freed immediately before returning, and I want s1 to survive either on return or in a recursive call?

There is no need for links (and therefore no need for resources), just pass the data directly:

 fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> { if n==0 { return s1; } else { let mut s0 = HashSet::new(); for &p in s1 { iterNeighbors(p, |p| { if !(s1.contains(&p) || s2.contains(&p)) { s0.insert(p); } }) } drop(s2); // guarantees timely deallocation return nthLoop(n-1, s0, s1); } } 

The key point here is that you can do everything by value, and everything that is passed by value will, of course, contain its values.

However, this does not compile:

 error: cannot borrow data mutably in a captured outer variable in an `Fn` closure [E0387] if !(s1.contains(&p) || s2.contains(&p)) { s0.insert(p); } ^~ help: see the detailed explanation for E0387 help: consider changing this closure to take self by mutable reference iterNeighbors(p, |p| { if !(s1.contains(&p) || s2.contains(&p)) { s0.insert(p); } }) 

That is, the closure tries to change the values ​​that it fixes ( s0 ), but the closure property Fn does not allow this. This trait can be invoked in a more flexible way (when used together), but it imposes more restrictions on what closure can do internally. (If you're interested, I wrote more about this: http://huonw.imtqy.com/blog/2015/05/finding-closure-in-rust/ )

Fortunately, there is an easy fix: using the FnMut attribute, which requires that a closure can only be called when it has unique access to it, but allows internal elements to mutate things.

 fn iterNeighbors<F>((i, j): (i32, i32), mut f: F) where F: FnMut((i32, i32)) { f((i-1, j)); f((i+1, j)); f((i, j-1)); f((i, j+1)); } 

The caller will look something like this:

The values ​​also work here: returning the link in this case will return a pointer to s0 , in which the stack frame will be stored, which will be destroyed as the function returns. That is, the link points to dead data.

The fix does not use links:

 fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> { let s0 = HashSet::new(); let mut s1 = HashSet::new(); s1.insert(p); return nthLoop(n, s1, s0); } 

This works if I set the closure manually, but I cannot figure out how to cause the closure. Ideally, I would like to set up a static dispatch.

(I don’t understand what this means, including the compiler error messages you came across helps us help you.)

In addition, I only used HashSet in F # because I assume that Rust does not provide a purely functional set with efficient set-theoretic operations (union, intersection, and difference). Am I right in assuming that?

Depending on what you want, no, for example. both HashSet and BTreeSet provide various set-theoretic operations as methods that return iterators .


Some small points:

  • The explicit / named lifetime allows the compiler to talk about the static validity of the data, they do not control it (i.e., allow the compiler to indicate when you are doing something wrong, but the language still has the same static usage / life guarantee resource as C ++)
  • the version with the loop is likely to be more efficient, as it is written, since it reuses memory directly (replacing sets plus s0.clear() , however, the same advantage can be realized with the recursive version by passing s2 for reuse, rather than to remove it.
  • while can be for _ in 0..n
  • there is no need to skip closures by reference, but with or without a link, there is a static dispatch (closing is a type parameter, not an object-object).
  • conditionally, the closing arguments are the last and are not accepted by reference, because it simplifies the determination and transmission of their embedded lines (for example, foo(x, |y| bar(y + 1)) instead of foo(&|y| bar(y + 1), x) )
  • The return keyword is not required for returned returns (if ; omitted):

     fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> { let s0 = HashSet::new(); let mut s1 = HashSet::new(); s1.insert(p); nthLoop(n, s1, s0) } 
+8
source

Source: https://habr.com/ru/post/1245597/


All Articles