Simple spatial efficient associative collection implementations in C?

Question

Simple spatial efficient associative collection implementations in C?

I am looking for an associative collection that supports both extracting and inserting values by key (deletion is not important) for at least O (Log (N)) time, and this has very low memory overhead both in terms of code size and memory consumption during work.

I do this for a small embedded application written in C, so I try to minimize the amount of required code and the amount of memory consumed.

Google's sparse hash structure would be possible if it hadn't been written in C ++ and was simpler.

Most of the hash table implementations that I know of use a sufficient amount of additional space, requiring at least twice as much space as the total number of key values, or requiring additional pointers to write (for example, hash algorithms for bucket chains). the structure of a pair of key values are just two pointers.

I am currently using an array of key / value pairs that is sorted, but the insert is O (N). I can't help but think that there should be a smart way to improve amortized load times, for example by making inserts in groups, but I have not succeeded.

I think this should be a relatively well-known problem in certain circles, so to make it not too subjective, I wonder what is the most common solution to the problem outlined above?

[EDIT:]

Some additional information that may make a difference:

Keys are integers
The number of values can be tiny from 1 to 2 ^ 32.
Usage patterns are unpredictable.
I hope that the memory consumption will be as low as possible (for example, doubling the required memory size will not be ideal)

+4

c memory-management hashtable data-structures associative-array

cdiggins Mar 6 '11 at 20:37

source share

3 answers

Look at > and to overcome the worst case (which has O (n) complexity for searching and inserting), use a balanced tree .

+2

Ins Mar 6 '11 at 20:44

source share

I would use a double hashed hash table to resolve conflicts. The general idea is to hash your original value, and if it does collide, make a second hash that gives the step value that you will use when going through the array to find a place to place the value. This makes good use of memory since there is no overhead for pointers and it maintains reasonable efficiency at much higher load factors than linear sensing.

Edit: if you want to change what you are doing right now, one of the options is to process inserts in clusters: save a sorted array and a separate set of new inserts. When the collection of new inserts gets too large, combine these elements into the main collection.

For the secondary collection, you have several options. You can simply use the un-sorted array and do a linear search - and just limit its size to (say) log (M), where M is the size of the main array. In this case, the overall search remains O (log N), imposes memory overhead and saves most attachments fairly quickly. When you combine a collection together, you (usually) want to sort the secondary collection, and then merge with the primary. This allows you to amortize linear merging by the number of elements in the secondary collection.

Alternatively, you can use the tree for your secondary collection. This means that newly inserted elements use additional storage for pointers, but (again), while maintaining this size, there are small restrictions on overhead.

+1

Jerry Coffin Mar 6 '11 at 20:54

source share

templatetypedef · Accepted Answer · 2011-03-06T20:53:29+0000

You can use a hash table that does not use a chain, such as a linear probing or cuckoo hash scheme. The backing implementation is just an array, and with a load factor of about 0.5, the overhead will not be too bad, and the implementation complexity (at least for linear or quadratic sounding) is not too great.

If you need a good implementation of a binary search tree that has excellent performance guarantees and is not too complex to encode, consider searching in trees. They guarantee depreciation of O (log n) and require only two pointers to node. The balance step is also much simpler than most balanced BSTs.

Simple spatial efficient associative collection implementations in C?

More articles: