Search for the first free index

Question

Search for the first free index

I have a large array / list of 1 million id, and then I need to find the first free id that I can use. It can be assumed that there are a couple of modules that belong to this data structure and take an identifier (during which it should be marked as used), and then return it later (should be marked as free). I want to know what different data structures can be used? and which algorithm can I use to efficiently use time and space (separately). Please excuse me if he was already here, I did a search before posting.

+4

c algorithm data-structures

Rishabh puri Jun 12 '13 at 18:04

source share

6 answers

One of the initial ideas that could work was to maintain a priority queue for all unused identifiers, sorted so that queues are removed using low-id identifiers. Using the standard binary heap, this will return the identifier to an unused identifier pool in O (log n) and find the next free identifier in O (log n). This has the disadvantage that this requires explicitly storing all identifiers that can be inefficient in space if there are a huge number of identifiers.

One potential optimization to save space would be to try to combine sequential identification values into identifier ranges. For example, if you have free identifiers 1, 3, 4, 5, 6, 8, 9, 10, and 12, you can just save ranges 1, 3-6, 8-10, and 12. This will require you to slightly change the base data structure. Instead of using a binary heap, you can use a balanced binary search tree that stores ranges. Since these ranges will not overlap, you can compare the ranges as smaller, equal, or larger than other ranges. Since BSTs are stored in sorted order, you can find the first free identifier by taking the minimum tree element (in O (log n)) and looking at the lower end of its range. Then you update the range to exclude this first element, which may require you to remove the empty range from the tree. When you return the identifier to the pool of unused identifiers, you can search for the predecessor and successor to determine the ranges that come immediately before and after the ID. If one of them can be extended to include this identifier, you can simply expand the range. (You may need to combine the two ranges). It also only takes O (log n) time.

Hope this helps!

+7

templatetypedef Jun 12 '13 at 18:12

source share

Try using a linked list (linked identifier list). Linking all of these lists, and the head should point to a free list (say, in init, everyone is free). Whenever it is marked as used, delete it and put it at the end of the list and make a headline item in the next free list. Thus, your list will be structured as "free to use." You can also get a free list at O (1). In addition, when the identifier is marked as free - put it as the first member of the linked list (since it will become free, it will become useful), that is, make a headline in this list. Hope this helps!

+2

rakib_ Jun 12 '13 at 18:16

source share

Preamble: A binary heap seems to be the best answer. I will provide an alternative here that may have advantages in some scenarios.

One possible way is to use the Fenwick Tree . You can store in each position either 0 or 1, indicating that the position has already been used or not. And you can find the first empty position with binary search (to find the first range [1..n], which has the sum n-1). The complexity of this operation is O (log ^ 2 n), which is worse than the binary heap, but this approach has other advantages:

You can implement a Fenwick tree in less than 10 lines of code
Now you can calculate the density (number of used / total identifiers) of the range in O (log n)

0

Juan lopes Jun 12 '13 at 20:37

source share

If you do not need a minimum identifier, you can select module identifiers in batches of 1000. When releasing identifiers, you can add them to the end of the list. And from time to time, you sort the list to make sure the identifiers you assign are at the bottom end.

0

Adrian panasiuk Jun 12 '13 at 10:08

source share

Well, an array is probably not the best structure. A hash would be better, at least in speed. As for the structure for each "node", all that I see you need is just an identifier, and it is used or not.

-1

Rui brito Jun 12 '13 at 18:10

source share

log0 · Accepted Answer · 2013-06-12T18:11:12+0000

A naive but effective method is to store all your identifiers on the stack. Obtaining an identifier is a constant-time operation: place the first element of a list. When the task is complete, just push the id on the stack.

If the minimum free id needs to be returned (and not some free id), you can use a bunch of minutes with insertion and pop position in O (log N).

Search for the first free index

More articles: