Comparing Overlapping Ranges

Question

Comparing Overlapping Ranges

I will ask this question using Scala syntax, although the question is really language independent.

Suppose I have two lists

val groundtruth:List[Range] val testresult:List[Range]

And I want to find all testresult elements that overlap some element in groundtruth .

I can do it like this:

 def overlaps(x:Range,y:Range) = (x contains y.start) || (y contains x.start) val result = testresult.filter{ tr => groundtruth.exists{gt => overlaps(gt,tr)}}

But this takes O(testresult.size * groundtruth.size) time.

Is there a faster algorithm for calculating this result or a data structure that can make the exists test more efficient?

PS The algorithm should run on groundtruth and testresult generated with an expression similar to the one below. In other words, there are no guarantees regarding the relationship between the ranges within the list; Range have an average size of 100 or more.

 (1 to 1000).map{x => val midPt = r.nextInt(100000); ((midPt - r.nextInt(100)) to (midPt + r.nextInt(100))); }.toList

+4

algorithm data-structures nlp

Ken bloom Apr 14 '11 at 18:47

source share

3 answers

If you can sort the groundtruth list by the beginning of the range, then for each range in testresult you can perform a binary search to get a subset of ranges whose lower bound is less than or equal to the range of interest, then you need to search this subset for those with a high bound greater than or equal to the upper limit of the tested range.

In the worst case, it is still O (n ^ 2), since it is possible that all groundtruth ranges will have a low score that meets the criteria, but the execution time with actual data is likely to be much less.

0

Jim mischel Apr 14 '11 at 19:06

source share

If the earth truth was stored in a hashed set, checking for the presence of elements of the test result in it would be O (n).

Edit: I did not understand that you are working exclusively with ranges represented by their endpoints. D'o!

Some kind of tree structure should be your best choice.

0

Marcin Apr 14 '11 at 10:50

source share

Fred foo · Accepted Answer · 2011-04-14T19:04:45+0000

Try the spacing tree . Cormen, Leiserson, Rivest, and Stein discuss them in chapter 14 (IIRC).

Alternatively, if your interval lists are sorted and the intervals within the list do not overlap, then the following algorithm solves your problem in linear time and in a single pass through both lists:

 (define interval cons) (define lower car) (define upper cdr) (define (overlap ab) (cond ((or (null? a) (null? b)) '()) ((< (upper a) (lower b)) (overlap (cdr a) b)) ((> (lower a) (upper b)) (overlap a (cdr b))) (#t ;; (car a) and (car b) overlap ;; EDIT: there a bug in the following part. ;; The code shouldn't skip over both cars at once, ;; since they may also overlap with further intervals. ;; However, I'm too tired to fix this now. (cons (interval (max (lower a) (lower b)) (min (upper a) (upper b))) (overlap ab)))))

(I hope you can read the Scheme :)

Comparing Overlapping Ranges

More articles: