Very fast insertion structure

I am looking for an ordered data structure that allows you to insert very quickly. This is the only property. Data will be accessible and deleted only from the top element.

To be more precise, I need 2 structures:

1) The first structure should allow ordered insertion using the int value. At the end of the insert, it should report the rank of the inserted item .

2) The second structure must allow insertion at the specified rank.

The number of stored items is likely to be in the thousands or tens of thousands.

[edit] I must correct the volume hypothesis: although at any time the size of the ordered structure is likely to be in the range of tens of thousands, the total number of investments is likely to be in tens of millions per run.

Insertion time in O (1) would be nice, although O (log (log (log (n))) is also very acceptable. Currently, I have an interesting candidate only for the first structure, but either in log (n) or without the ability to report insert ranking (which is mandatory).

+4
source share
4 answers

How about the skip-list form, specifically the "indexed skipist" in a related article. This should insert O (lg N) insertion and search, and O (1) access the first node for both use cases.

- Edit -

When I think of O (1) algorithms, I think of fundamentally based methods. Here is the O (1) insert with the returned rank. The idea is to split the key into nibbles and save the number of all inserted elements that have this prefix. Unfortunately, the constant is high (<= 64 sections and additions), and the memory is O (2 x 2 ^ INT_BITS), which is terrible. This is a version for 16-bit ints, expanding to 32 bits, should be simple.

int *p1;int *p2;int *p3;int *p4; void **records; unsigned int min = 0xFFFF; int init(void) { p1 = (int*)calloc(16,sizeof(int)); p2 = (int*)calloc(256, sizeof(int)); p3 = (int*)calloc(4096, sizeof(int)); p4 = (int*)calloc(65536,sizeof(int)); records = (void**)calloc(65536,sizeof(void*)); return 0; } //records that we are storing one more item, //counts the number of smaller existing items int Add1ReturnRank(int* p, int offset, int a) { int i, sum=0; p+=offset; for (i=0;i<a;i++) sum += p[i]; p[i]++; return sum; } int insert(int key, void* data) { unsigned int i4 = (unsigned int)key; unsigned int i3= (i4>> 4); unsigned int i2= (i3>> 4); unsigned int i1= (i2>> 4); int rank = Add1ReturnRank(p1,0, i1&0xF); rank += Add1ReturnRank(p2,i2&0xF0,i2&0xF); rank += Add1ReturnRank(p3,i3&0xFF0,i3&0xF); rank += Add1ReturnRank(p4,i4&0xFFF0,i4&0xF); if (min>key) {min = key;} store(&records[i4],data); return rank; } 

This framework also supports O (1) GetMin and RemoveMin. (GetMin is instantaneous; Remove has a constant similar to Insert.)

 void* getMin(int* key) { return data[*key=min]; } void* removeMin(int* key) { int next = 0; void* data = records[min]; unsigned int i4 = min; unsigned int i3= (i4>> 4); unsigned int i2= (i3>> 4); unsigned int i1= (i2>> 4); p4[i4]--; p3[i3]--; p2[i2]--; p1[i1]--; *key = min; while (!p1[i1]) { if (i1==15) { min = 0xFFFF; return NULL;} i2 = (++i1)<<4; } while (!p2[i2]) i3 = (++i2)<<4; while (!p3[i3]) i4 = (++i3)<<4; while (!p4[i4]) ++i4; min = i4; return data; } 

If your data is sparse and well distributed, you can delete the p4 counter, and instead enter insertion sorting at the P3 level. This will reduce storage costs by 16, due to a higher worst-case insertion when there are many similar values.

Another idea to improve storage is to combine this idea with something like Extendable Hash . Use the integer key as the hash value and keep the count of inserted nodes in the directory. The execution of the sum of the corresponding dictionary entries in the box (as indicated above) should be O (1) with a large constant, but the storage will be reduced to O (N)

+2
source

The order statistics tree seems to fit your O (LogN) time requirements. Link

The ordering tree is an extended version (see AugmentedDataStructures) of the BinarySearchTree version that supports additional operations Rank (x), which returns the rank x (i.e. the number of elements with keys less than or equal to x) and FindByRank (k), which returns k- th smallest tree element.

If you have only tens of thousands of elements, the performance difference between O (LogN) time and O (1) time asymmetry is not as important as you thought. For example, consider 100,000 elements, the logN method will only be 16 times slower.

log (100,000) / log (2) = 16.6096405

In this case, the difference in the coefficient (implementation, overhead) may be the real goal of optimization. Unusual data structures usually have much higher costs due to the complexity of inheritance (sometimes thousands of times slower). They most likely come from a less perfect implementation, because they are less used.

You have to check (actually check) the various heap implementations in order to find one with the best real performance .

+1
source

You say that you need an ordered data structure that sounds to me as if you need something that can give all the elements contained in O (n) time.

But then you say that you will only get access to the upper (smallest?) Element, assuming that you really need something that can give a minimum value, repeatedly - opening the door to something with partial order.

What is it?

+1
source

If I understood your question correctly, I would recommend that you use a dictionary whose keys are the rank and values ​​associated with the list.

Using the keys, you can have ranks and with a linked list as values, you can have O (1) insertion time. Just like deletion, you can have O (1). You can implement a stack or queue with a linked list that you need.

Or you can simply use a doubly linked list in which you have the option to install and uninstall O (1). for ranking, you can embed this information in nodes.

0
source

Source: https://habr.com/ru/post/1385029/


All Articles