Data structure with efficient manipulation and search by keywords and indexes

I am looking for a data structure with functionality, for example. OrderedDictionary in .NET, that is, an associative set (i.e. one that binds a key to a value) that maintains the order of elements (as a regular List does).

It should have a quick search by index and key. It should also have a quick β€œadd” operation (insert a new element at the end) and quickly delete elements with any index (based on the index or key).

OrderedDictionary in .NET uses both a hash table and an array to store its elements, if I'm not mistaken. Thus, re-reading the index based on the key (or vice versa) is O (n), and, of course, removing an element from the middle of the array is O (n) to begin with, plus the added index search from if you delete the key.

My question is, is there a more efficient data structure that satisfies my conditions, or if this is really my best option here?

+6
source share
4 answers

I think you can do this with two red-black trees: a key search tree for storing keys ordered by the comparison function, and an index search tree with keys in random order, as in a list. Each node index search must have a "size" field - the Red-Black tree can search by index if the "size" field is included in each node. See, for example, the implementation of RedBlackTreeSet in the C5 Generic Collection Library .

Each entry in the key search tree requires a pointer to its corresponding entry in the index-lookup tree. Like the left and right-node pointers, the index search tree will require a parent pointer field to provide top-down navigation as well as top-down navigation.

For each key, six pointers are required: regular left and right pointers in both nodes, as well as a pointer from the-lookup-node to index-lookup-node, plus the parent pointer in each index-lookup node. You will also need a pointer in each node to indicate the stored value.

Operations:

Append - the add operation inserts a key into both trees - once in the key search tree at the position determined by the comparison function, and again in the far right direction of the index search tree. Insertion into a red-black tree is a logarithmic operation of time.

Key search - this is done in the key search tree using the comparison function to find the correct position - O (log (n))

Index search - this can be done in the index search field, as described above - O (log (n))

Get the index from the key - first find the key in the key search tree O (log (n)). Follow the pointer to the index search tree. Follow the parent's pointers to the root of the node, (O (log (n)) for a balanced tree). Use the size fields on the way up to determine the key index. - O (log (n)) as a whole.

Delete by index - search for an item in the index tree. Remove from index search tree. Find the key located in the key search tree. Remove from the key search tree. All operations: O (log (n)), therefore delete - O (log (n)) as a whole.

Delete key - use "Get Index from Key" to get the key index. Delete by index from the index search tree. Delete a key from the key search tree. O (log (n)) as a whole.

This structure also supports setting O (log (n)) at any arbitrary position, not just at the end.

The storage overhead is clearly significant, but O (n) remains. The complexity of the time meets all the requirements.

Unfortunately, I do not know about any implementation of this structure.

Update. It seems to me that you can combine a tree with a hash table to get a search on O (1). Instead of having two trees, as I suggest above, use a hash table for searching by key and a balanced statistics tree in order to search by position as above, but have hash table slots containing pointers to nodes of the balanced tree to search get-list-position-by-key. Currently, the search is by keywords O (1), while everything else remains O (ln (n)) on average. Of course, now you get a random O (n) re-hash penalty, as in any hash table.

+4
source

OrderedDictionary really fits your requirements.

Your OrderedDictionary analysis is incorrect. Actually it is O (1) for search by keywords and O (1) for an index according to this .

Even a simple analysis gives you an O (1) search, either by key or by index. Arrays provide O (1) access and hash tables provide efficient O (1) access.

Insertion / deletion is more complex, but assuming that the amortized analysis is still O (1)

The article claims that it is O (n) for insertion and deletion. This, at least, is not performed for insertion, since the amortized analysis allows you to simply "spend" on inserting this element from 1 to 2. When inserting an element that requires changing the size of the array, the second half of the cost is used to pay the cost of copying. The final insert will take longer, but it is still O (1) amortized, and the discrepancy appears only if you cause the array to resize, which is unlikely.

+3
source

You may find something interesting here in the C6 Generic Collection Library for C # (from page 233)

+2
source

You can use a balanced binary search tree as a link, only to define the TreeNode you have to add your keys, but the problem is that the element is not O (1), it is O (log (n)) both by keys and by index (in fact, the index is not part of the TreeNode, relatively can be found), but all operations are O (log (n)) and the fastest way based on comparison methods.

+1
source

Source: https://habr.com/ru/post/899066/


All Articles