Ranking a data structure other than std :: vector?

Question

Ranking a data structure other than std :: vector?

I came across an application in which I need to create a container with random access (or at least better than O (n)), have inexpensive (O (1)) insertion and deletion, and save the data in order (rank) ) specified during insertion.

For example, if I have the following array:

[2, 9, 10, 3, 4, 6]

I can call delete from index 2 to remove 10, and I can also call insert at index 1 by inserting 13.

After these two operations, I would:

 [2, 13, 9, 3, 4, 6]

The numbers are stored in sequence, and insert / delete operations require the index parameter to indicate where the number should be inserted or which number should be deleted.

My question is, which data structures besides Linked List and vector can support something like this? I lean toward Heap , which prioritizes the next available index. But I saw something about the Fusion Tree , useful (but more in a theoretical sense).

What data structures could give me the optimal runtime while maintaining memory consumption? I play with keeping the hash table while maintaining the position, but so far it has not been successful.

The reason I throw using std :: vector straight up is because I have to build something that teaches the vector in terms of these basic operations. The size of the container can grow up to hundreds of thousands of elements , so there is no question for hyphenation in std :: vector. The same problematic lines with the Linked List (even if Linked twice), passing it to this index, will take in the worst case O (n / 2), which is rounded to O (n).

I was thinking of a duplicate list that contained the Head, Tail, and Middle index, but I felt that it would not be much better.

+6

c ++ heap data-structures rank insertion-order

Dan hoynoski Apr 23 '15 at 7:33

source share

2 answers

Consider a skip list that can implement linear time rank operations in "indexable . "

For algorithms (pseudo-code), see Skip List Cookbook, by Pugh .

It may be that the implicit key binary search method described above by @Petr is easier to get and may even work better.

0

davidbak May 02, '16 at 19:40

source share

Petr · Accepted Answer · 2015-04-23T07:58:37+0000

In basic use, to be able to insert and delete at any position, you can use linked lists. They allow O (1) to insert / delete, but only on condition that you have already placed a position in the list where to insert. You can insert “after a specific element” (that is, by pointing to the element), but you cannot insert “at a given index” so efficiently.

To be able to insert and delete an item based on its index, you need a more complex data structure. There are at least two such structures that I know of.

One is a rope structure that is available in some C ++ extensions ( SGI STL or in GCC via #include <ext/rope> ). This allows O (log N) to insert / delete at any position.

Another structure to insert / delete O (log N) is an implicit treap (also an implicit Cartesian tree), you can find some information in http://codeforces.com/blog/entry/3767 , Treap with implicit keys or https : //codereview.stackexchange.com/questions/70456/treap-with-implicit-keys .

The implicit treap can also be changed so that you can find the minimum value in it (and also support much more operations). Not sure if this will handle the rope.

UPD In fact, I assume that you can adapt any O (log N) binary search tree (for example, AVL or red-black tree) for your query, converting it into an “implicit key” scheme. The general scheme is as follows.

Imagine a binary search tree that at any given moment stores consecutive numbers 1, 2, ..., N as its keys (N is the number of nodes in the tree). Each time we change the tree (insert or delete the node), we recount all the saved keys so that they are still from 1 to the new value of N. This will allow you to insert / delete at an arbitrary position, since the key is now a position, but for updating all keys will take too long.

To avoid this, we will not store the keys in the tree explicitly. Instead, for each node, we will store the number of nodes in its subtree. As a result, at any time, when we go down from the root of the tree, we can track the index (position) of the current node - we just need to sum the sizes of the subtrees that we have on our left. This allows us, given k, to find a node that has an index k (i.e., the kth in the standard order of the binary search tree), at O (log N) time. After that, we can insert or delete at this position using the standard binary tree procedure; we just need to update the subtree sizes of all nodes changed during the update, but this is easy to do for O (1) for each node, so the total insert or delete time will be O (log N) as in the original binary search tree.

Thus, this approach allows you to insert / delete / access nodes at a given position in O (log N), using any binary search tree O (log N) as the basis. Of course, you can save additional information ("values") that you need in the nodes, and you can even calculate the minimum of these values in the tree simply by observing the minimum value for each node subtree.

However, the aforementioned treap and rope are more advanced, as they also allow splitting and merging (by taking a substring / submatrix and combining two strings / arrays).

Ranking a data structure other than std :: vector?

More articles: