What data structure can I use to store and retrieve ranges of discrete values?

Question

What data structure can I use to store and retrieve ranges of discrete values?

I have a JavaScript program in which I will control a lot of integers. In this context, a range is simply the start and end value (or something equivalent, like the start and end value) with reference to another object. Ranges may overlap and may be identical (although the reference object will be different).

Possible start and end values range from 0 to 4294967295 (2 ³² - 1 or 0xFFFFFFFF ), although there are several large “holes” in the area that will not cover the range, even partially. Most ranges will be very small compared to the scope of possibilities: I expect that the vast majority will be less than 2000 in length.

My most important use case for this structure would be to search for all ranges containing a given integer value. In most cases, I expect the search to fail (the range containing the given value will not).

Otherwise, I obviously also need to add elements to it (often) and remove elements from it (rarely). From time to time, I also need to find all ranges that overlap a given range, and not all ranges that contain a single value.

What data structure can I use for this? A linear search in the range list is impractical because search queries do not work most of the time; and I very often need to do searches.

+6

javascript data-structures

zneak Apr 24 '12 at 23:16

source share

4 answers

cyber-monk · Answer 1 · 2012-04-24T23:55:51+0000

binary tree, where the key is the start (low) value. Once you find the key, you can easily and simply look (above and below).

Tom mallard · Answer 2 · 2012-04-24T23:23:00+0000

I like System.Tuple for something like this [or the F # list, but few know F #].

If the range is continuous, which makes it easy to assign start and end integers as tuples Tuple nums = (start, end), otherwise a tuple with a start end as the first entry in the tuple and list as the second may work for you, Tuple nums = (( beginning, end), list).

Ben · Answer 3 · 2012-04-24T23:29:38+0000

If you save the beginning and end of all ranges in one list as a map back to the range index, you can do this in n order. those. mylist = [{45: range1}, {47: range2}, {55: range1}, {57: range2}] Then you can view the list and set the boolean value to true when the tag first appears and false the second time you tag it will see. when you find a number higher than yours, you can determine which ranges you are in. You can use bisect to insert O (logn), and delete and insert O (n). Good luck ~ Ben

pkr298 · Answer 4 · 2012-04-25T00:16:26+0000

ATTEMPT 1:

Store 2 binary trees, one for the start values and one for the end values. Your nodes for both trees (or simply "end") have a property that refers to unique ranges with some id (the initial value of the range).

Do a binary search in the "start" tree to narrow the list down to ranges where the start is less than or equal to your search value. Do the same in the "end" tree, where the value is greater than or equal to the search value. Find the intersection of the nodes of both trees, and these ranges contain your search value.

You can find the intersection using a hash map / set for optimal performance.

ATTEMPT 2:

What if you saved a hash list for intervals when keys are the first but a large number of bits that are separated by both start and end values?

So, if start is "11001101" and end is "11010010", the key is "110". Each key will display a list of ranges (beginning and end) that share the key.

When searching for a value, to see what ranges they are in, for example, "00101111", you will have to look for a hash list for n different values or where n is the number of bits (32 in your case). In this case, you will search for "00101111", "0010111", "001011", etc. For each hit, you will need to check that the search value is in the range.

At first glance, it seems to me that on average for each hit you get half the false positives, but it doesn’t matter if the number of hits is small, and the more keys, the fewer hits it should receive.

There is a small problem, say, the beginning of “00101110” and the end of “01100111”, because the key will be “0”, which means a significant amount of “false positives”. It would be better if there were two different keys: "001" and "01", although I'm not sure about the specific algorithm that you need for coding for this optimization. If the ranges are small enough, and this problem can be solved or skipped, then you can get a very fast search, since most of the keys will be relatively long and will not match the search queries.

What data structure can I use to store and retrieve ranges of discrete values?

More articles: