Stack and hash join

I am trying to write a data structure that is a combination of Stack and HashSet with fast push / pop / membership (I am looking for constant time operations). Think of the Python OrderedDict.

I tried several things and I came up with the following code: HashInt and SetInt . I need to add documentation to the source, but mostly I use a linear probing hash to store indexes in a key vector. Since linear sensing always puts the last element at the end of a continuous range of already filled cells, pop () can be implemented very easily without a complicated delete operation.

I have the following problems:

  • the data structure consumes a lot of memory (some improvements are obvious: stackKeys more than necessary).
  • some operations are slower than if I used fastutil (for example: pop (), even push () in some scenarios). I tried rewriting classes using fastutil and trove4j, but the overall speed of my application was halved.

What performance improvements would you suggest for my code? Which open source library / code do you know that I can try?

+4
source share
3 answers

You already have a pretty good implementation. The only improvement obvious to me is that you do more work than you need, looking for when you appear. You should not store the key itself on the stack, but the index into the key array. This gives you trivially fast pop-ups due to another pointer indirection when you want to look at the last element.

Just compare the stack size with LOAD_FACTOR * (the size of the heap array) and you should have the same fast implementation that you would expect with as little memory as you can manage, given your speed requirements.

+1
source

I think what you want is (almost) already available in libraries: LinkedHashSet is a hash set with a basic doubly linked list (which makes it iterable). LinkedHashMap even has removeEldestEntry, which sounds very similar to the pop method.

How a naive solution works like:

class HashStack<T> {

  private HashMap<T, Integer> counts = new HashMap<T, Integer>(); private Stack<T> stack = new Stack<T>(); public void push(T t) { stack.push(t); counts.put(t, 1 + getCount(t)); } public T pop() { T t = stack.pop(); counts.put(t, counts.get(t) - 1); return t; } private int getCount(T t) { return counts.containsKey(t) ? counts.get(t) : 0; } public boolean contains(T t) { return getCount(t) > 0; } public String toString() { return stack.toString(); } } 
+1
source

I would suggest using TreeSet <T> because it provides a guaranteed O (log n) value for adding, deleting and contains.

0
source

Source: https://habr.com/ru/post/1309098/


All Articles