One of the initial ideas that could work was to maintain a priority queue for all unused identifiers, sorted so that queues are removed using low-id identifiers. Using the standard binary heap, this will return the identifier to an unused identifier pool in O (log n) and find the next free identifier in O (log n). This has the disadvantage that this requires explicitly storing all identifiers that can be inefficient in space if there are a huge number of identifiers.
One potential optimization to save space would be to try to combine sequential identification values โโinto identifier ranges. For example, if you have free identifiers 1, 3, 4, 5, 6, 8, 9, 10, and 12, you can just save ranges 1, 3-6, 8-10, and 12. This will require you to slightly change the base data structure. Instead of using a binary heap, you can use a balanced binary search tree that stores ranges. Since these ranges will not overlap, you can compare the ranges as smaller, equal, or larger than other ranges. Since BSTs are stored in sorted order, you can find the first free identifier by taking the minimum tree element (in O (log n)) and looking at the lower end of its range. Then you update the range to exclude this first element, which may require you to remove the empty range from the tree. When you return the identifier to the pool of unused identifiers, you can search for the predecessor and successor to determine the ranges that come immediately before and after the ID. If one of them can be extended to include this identifier, you can simply expand the range. (You may need to combine the two ranges). It also only takes O (log n) time.
Hope this helps!
source share