An iterator adapter that implements SQL-like CORRECT INTERNAL WORK using HashMap

I am trying to extend bluss rust-itertools with iterators like SQL. I ran into a specific problem with RIGHT OUTER JOIN using a hash join strategy (the strategy itself is actually very simple).

The iterator structure struct accepts two input iterators, of which the second (right) is loaded into the HashMap. The iteration works as follows:

  • An element from the left iterator is mapped to a map - in case of a match, both elements are returned
  • When the left iterator is exhausted, return inconsistent values ​​from the map

The problem is related to the second part, where I tried to save the Map values ​​identifier with the map in order to save its iterative state. But, as I found out in this answer , this is not possible in rust. Unfortunately, I have no idea how to do this otherwise.

Here is the complete code for the INNER JOIN adapter, which executes the first part:

use std::collections::HashMap; use std::hash::Hash; pub struct HashJoinInner<I, K, V0, V1> where I: Iterator<Item=(K, V0)>, K: Hash + Eq, V1: Clone, { left: I, right: HashMap<K, V1>, } impl<I, K, V0, V1> HashJoinInner<I, K, V0, V1> where I: Iterator<Item=(K, V0)>, K: Hash + Eq, V1: Clone, { /// Create a `HashJoinInner` iterator. pub fn new<J>(l: I, r: J) -> Self where J: Iterator<Item=(K, V1)> { let mut hm: HashMap<K, V1> = HashMap::new(); for (k, v) in r { hm.insert(k, v); } HashJoinInner { left: l, right: hm, } } } impl<I, K, V0, V1> Iterator for HashJoinInner<I, K, V0, V1> where I: Iterator<Item=(K, V0)>, K: Hash + Eq, V1: Clone, { type Item = (V0, V1); fn next(&mut self) -> Option<Self::Item> { loop { match self.left.next() { Some((k0, v0)) => match self.right.get(&k0) { Some(v1) => return Some((v0, Clone::clone(v1))), None => continue, }, None => return None, } } } } 

I would be grateful for any idea.

+4
source share
2 answers

You cannot save the Values iterator because it contains references to the HashMap . These links may become invalid if you move the map. However, you can use the HashMap with the into_iter method. It owns all the HashMap values ​​and can be moved to a new structure.

Here is the setup of your code from an earlier question. This is not a connection left or right. There is the difficulty of switching from one iterator to the completion of another iterator.

 use std::collections::hash_map::{HashMap, IntoIter}; use std::hash::Hash; struct Foo<K, V> where K: Hash + Eq, V: Clone, { iter: IntoIter<K, (V, bool)>, } impl<K, V> Foo<K, V> where K: Hash + Eq, V: Clone, { fn new<I>(it: I) -> Self where I: Iterator<Item=(K, V)> { let mut map = HashMap::new(); for (k, v) in it { map.insert(k, (v, false)); } Foo { iter: map.into_iter() } } } impl<K, V> Iterator for Foo<K, V> where K: Hash + Eq, V: Clone, { type Item = V; fn next(&mut self) -> Option<Self::Item> { loop { match self.iter.next() { Some((_, (v, false))) => return Some(v.clone()), Some(_) => continue, None => return None, } } } } fn main() { let it = (0..).zip("AB".chars()); let foo = Foo::new(it); for v in foo { println!("{}", v); } } 

However, you do not need to do anything to get what you want. You can simply create a hash map and check it out when you iterate over another item. I accidentally created a left outer join, but just flipped the arguments to get the right outer join. ^ _ ^

 use std::collections::hash_map::HashMap; use std::hash::Hash; struct LeftOuterJoin<L, K, RV> { left: L, right: HashMap<K, RV>, } impl<L, K, RV> LeftOuterJoin<L, K, RV> where K: Hash + Eq { fn new<LI, RI>(left: LI, right: RI) -> Self where L: Iterator<Item=LI::Item>, LI: IntoIterator<IntoIter=L>, RI: IntoIterator<Item=(K, RV)> { LeftOuterJoin { left: left.into_iter(), right: right.into_iter().collect() } } } impl<L, K, LV, RV> Iterator for LeftOuterJoin<L, K, RV> where L: Iterator<Item=(K, LV)>, K: Hash + Eq, RV: Clone { type Item = (K, LV, Option<RV>); fn next(&mut self) -> Option<Self::Item> { match self.left.next() { Some((k, lv)) => { let rv = self.right.get(&k); Some((k, lv, rv.cloned())) }, None => None, } } } fn main() { let mut left = HashMap::new(); left.insert(1, "Alice"); left.insert(2, "Bob"); let mut right = HashMap::new(); right.insert(1, "Programmer"); for (id, name, job) in LeftOuterJoin::new(left.into_iter(), right) { println!("{} ({}) is a {:?}", name, id, job); } } 
+1
source

Thanks to Shepmaster's idea of ​​using std::collections::hash_map::IntoIter I was able to solve the problem.

Here is the complete solution for RIGHT OUTER JOIN using a hash join strategy:

 use std::collections::hash_map::{HashMap, IntoIter,}; use std::mem; use std::hash::Hash; #[must_use = "iterator adaptors are lazy and do nothing unless consumed"] pub struct HashJoinRightOuter<L, K, RV> { left: L, map: HashMap<K, (RV, bool)>, /// exclusion iterator - yields the unmatched values from the map. It is created once the left /// iterator is exhausted excl_iter: Option<IntoIter<K, (RV, bool)>>, } impl<L, K, RV> HashJoinRightOuter<L, K, RV> where K: Hash + Eq, { /// Create a `HashJoinRightOuter` iterator. pub fn new<LI, RI>(left: LI, right: RI) -> Self where L: Iterator<Item=LI::Item>, LI: IntoIterator<IntoIter=L>, RI: IntoIterator<Item=(K, RV)> { let mut map: HashMap<K, (RV, bool)> = HashMap::new(); for (k, v) in right.into_iter() { map.insert(k, (v, false)); } HashJoinRightOuter { left: left.into_iter(), map: map, excl_iter: None, } } /// Moves the map to `self.excl_iter` /// /// Once the left iterator is exhausted, the info about which keys were matched is complete. /// To be able to iterate over map values we need to move it into its `IntoIter`. fn set_excl_iter(&mut self) { let map = mem::replace(&mut self.map, HashMap::<K, (RV, bool)>::new()); self.excl_iter = Some(map.into_iter()); } } impl<L, K, LV, RV> Iterator for HashJoinRightOuter<L, K, RV> where L: Iterator<Item=(K, LV)>, K: Hash + Eq, RV: Clone, { type Item = (Option<LV>, RV); fn next(&mut self) -> Option<Self::Item> { loop { match self.excl_iter { // the left iterator is not yet exhausted None => match self.left.next() { Some((lk, lv)) => match self.map.get_mut(&lk) { Some(rt) => { rt.1 = true; // flag as matched return Some((Some(lv), Clone::clone(&rt.0))) }, None => continue, // not interested in unmatched left value }, // the left iterator is exhausted so move the map into `self.excl_iter`. None => self.set_excl_iter(), }, // iterate over unmatched values Some(ref mut r) => match r.next() { Some((_, (rv, matched))) => { if !matched { return Some((None, rv)); } else { continue; } }, None => return None, } } } } } fn main() { let a = (0..).zip("AB".chars()); let b = (1..).zip("XY".chars()); let mut it = HashJoinRightOuter::new(a, b); assert_eq!(it.next(), Some((Some('B'), 'X'))); assert_eq!(it.next(), Some((None, 'Y'))); assert_eq!(it.next(), None); } 

In the beginning, I failed because I tried to store both the data and the link in the same structure, which in any case does not matter. What I really wanted was to save the data first, do some magic with it and do it, move it to another field to work with its conversion.

This can be used to solve other problems related to self-reference.

0
source

Source: https://habr.com/ru/post/1235506/


All Articles