How to define a rectangle in a rectangle?

I have the coordinates of the upper left corner, and the lower right corner of the list of rectangles says (a, b) and (c, d). I want to detect and remove the rectangles that are inside the rectangle. Overlapping rectangles may remain.

I have a dataset of 10,000 rectangles and I want an efficient way to solve this problem.

I'm currently doing it this way

import pandas data = pd.read_csv('dataset.csv') l = list(range(len(data)-1)) for i in range(len(data)): length = len(l) if i >= length: break for j in range(i+1, length): if j >= len(l): break if (data.iloc[l[i]]['a'] >= data.iloc[l[j]]['a']) and (data.iloc[l[i]]['b'] <= data.iloc[l[j]]['b']) and (data.iloc[l[i]]['c'] <= data.iloc[l[j]]['c']) and (data.iloc[l[i]]['d'] >= data.iloc[l[j]]['d']): l.pop(j) 

I implemented this algorithm after sorting the data set in descending order of the rectangle area, because rectangles with large areas do not fit inside rectangles with smaller areas. Here I enter the index of the rectangle from the list l after finding if it is inside another rectangle. Each time an element is exposed, it reduces iterations.

It takes several hours to solve, and I need an effective way to solve it even for hundreds of thousands of samples.

Please, help!

+5
source share
3 answers

Here is a small separation and rest algorithm that you could try.

I assume that as soon as you can quickly list each pair of colliding rectangles, you can also check to see if it is completely contained in another in constant time.

So, we need to find oncoming rectangles.

First, summarize it as follows: suppose you have two sets of rectangles A and B , and that you want to find only pairs (a, b) , so that rectangle A is equal to A , B is equal to B , and A and B

First of all, an idea. Consider the following example of two groups A and B rectangles partially separated by a horizontal line L :

  +----+ +-----+ | A1 | | B1 | | | +-----+ +-----+ +----+ | A2 | +-----+ +-----+ | A3 | _____________________________|_____|________ L | | +-------------------+##+ | | |##| | | B2 +##|--+ | | +----------------------+ 

Line L divides the sets A and B into three subsets:

 A above L: {A1, A2} (short: A>L) A intersects L: {A3} (short: A=L) A below L: {} (short: A<L) B above L: {B1} (B>L) B intersects L: {} (B=L) B below L: {B2} (B<L) 

Please note that only rectangles from the following groups can collide:

  A<LA=L A>L B<L yy N B=L yyy B>LN yy 

That is, if we want to find all collisions between A and B , as soon as we find a suitable line L , we can ignore the collisions between A<L against B>L and A>L against B<L . Thus, we get the following division and rest algorithm: while A and B not empty, find a suitable line that (approximately) maximizes the number of remote collision checks, subdivides A and B into three groups each, recursively proceed to seven subgroup collisions, ignore two combinations of subgroups.

Assuming that if the rectangles are “small” and the groups A=L and B=L are mostly empty, this will (roughly) reduce the size of the sets in half at each step, and thus we get an algorithm that on average works somehow like O(n*log(n)) instead of O(n*n) .

Once you have a general case for arbitrary A and B , take the whole set of rectangles R and run the algorithm using A = R; B = R A = R; B = R

Here is an approximate sketch in Python:

 def isSubinterval(aStart, aEnd, bStart, bEnd): return aStart >= bStart and aEnd <= bEnd def intersects(aStart, aEnd, bStart, bEnd): return not (aEnd < bStart or aStart > bEnd) class Rectangle: def __init__(self, l, r, b, t): self.left = l self.right = r self.bottom = b self.top = t def isSubrectangle(self, other): return ( isSubinterval(self.left, self.right, other.left, other.right) and isSubinterval(self.bottom, self.top, other.bottom, other.top) ) def intersects(self, other): return ( intersects(self.left, self.right, other.left, other.right) and intersects(self.bottom, self.top, other.bottom, other.top) ) def __repr__(self): return ("[%f,%f]x[%f,%f]" % (self.left, self.right, self.bottom, self.top)) def boundingBox(rects): infty = float('inf') b = infty t = - infty l = infty r = - infty for rect in rects: b = min(b, rect.bottom) l = min(l, rect.left) r = max(r, rect.right) t = max(t, rect.top) return Rectangle(l, r, b, t) class DividingLine: def __init__(self, isHorizontal, position): self.isHorizontal = isHorizontal self.position = position def isAbove(self, rectangle): if self.isHorizontal: return rectangle.bottom > self.position else: return rectangle.left > self.position def isBelow(self, rectangle): if self.isHorizontal: return rectangle.top < self.position else: return rectangle.right < self.position def enumeratePossibleLines(boundingBox): NUM_TRIED_LINES = 5 for i in range(1, NUM_TRIED_LINES + 1): w = boundingBox.right - boundingBox.left yield DividingLine(False, boundingBox.left + w / float(NUM_TRIED_LINES + 1) * i) h = boundingBox.top - boundingBox.bottom yield DividingLine(True, boundingBox.bottom + h / float(NUM_TRIED_LINES + 1) * i) def findGoodDividingLine(rects_1, rects_2): bb = boundingBox(rects_1 + rects_2) bestLine = None bestGain = 0 for line in enumeratePossibleLines(bb): above_1 = len([r for r in rects_1 if line.isAbove(r)]) below_1 = len([r for r in rects_1 if line.isBelow(r)]) above_2 = len([r for r in rects_2 if line.isAbove(r)]) below_2 = len([r for r in rects_2 if line.isBelow(r)]) # These groups are separated by the line, no need to # perform all-vs-all collision checks on those groups! gain = above_1 * below_2 + above_2 * below_1 if gain > bestGain: bestGain = gain bestLine = line return bestLine # Collides all rectangles from list `rects_1` with # all rectangles from list `rects_2`, and invokes # `onCollision(a, b)` on every colliding `a` and `b`. def collideAllVsAll(rects_1, rects_2, onCollision): if rects_1 and rects_2: # if one list empty, no collisions line = findGoodDividingLine(rects_1, rects_2) if line: above_1 = [r for r in rects_1 if line.isAbove(r)] below_1 = [r for r in rects_1 if line.isBelow(r)] above_2 = [r for r in rects_2 if line.isAbove(r)] below_2 = [r for r in rects_2 if line.isBelow(r)] intersect_1 = [r for r in rects_1 if not (line.isAbove(r) or line.isBelow(r))] intersect_2 = [r for r in rects_2 if not (line.isAbove(r) or line.isBelow(r))] collideAllVsAll(above_1, above_2, onCollision) collideAllVsAll(above_1, intersect_2, onCollision) collideAllVsAll(intersect_1, above_2, onCollision) collideAllVsAll(intersect_1, intersect_2, onCollision) collideAllVsAll(intersect_1, below_2, onCollision) collideAllVsAll(below_1, intersect_2, onCollision) collideAllVsAll(below_1, below_2, onCollision) else: for r1 in rects_1: for r2 in rects_2: if r1.intersects(r2): onCollision(r1, r2) 

Here is a small demo:

 rects = [ Rectangle(1,6,9,10), Rectangle(4,7,6,10), Rectangle(1,5,6,7), Rectangle(8,9,8,10), Rectangle(6,9,5,7), Rectangle(8,9,1,6), Rectangle(7,9,2,4), Rectangle(2,8,2,3), Rectangle(1,3,1,4) ] def showInterestingCollision(a, b): if a is not b: if a.left < b.left: print("%r <-> %r collision" % (a, b)) collideAllVsAll(rects, rects, showInterestingCollision) 

At least in this case, it really detects all the interesting collisions:

 [1.000000,6.000000]x[9.000000,10.000000] <-> [4.000000,7.000000]x[6.000000,10.000000] collision [1.000000,5.000000]x[6.000000,7.000000] <-> [4.000000,7.000000]x[6.000000,10.000000] collision [4.000000,7.000000]x[6.000000,10.000000] <-> [6.000000,9.000000]x[5.000000,7.000000] collision [6.000000,9.000000]x[5.000000,7.000000] <-> [8.000000,9.000000]x[1.000000,6.000000] collision [7.000000,9.000000]x[2.000000,4.000000] <-> [8.000000,9.000000]x[1.000000,6.000000] collision [2.000000,8.000000]x[2.000000,3.000000] <-> [8.000000,9.000000]x[1.000000,6.000000] collision [2.000000,8.000000]x[2.000000,3.000000] <-> [7.000000,9.000000]x[2.000000,4.000000] collision [1.000000,3.000000]x[1.000000,4.000000] <-> [2.000000,8.000000]x[2.000000,3.000000] collision 

Here is a slightly more realistic demo:

 from random import random from matplotlib import pyplot as plt def randomRect(): w = random() * 0.1 h = random() * 0.1 centerX = random() * (1 - w) centerY = random() * (1 - h) return Rectangle( centerX - w/2, centerX + w/2, centerY - h/2, centerY + h/2 ) randomRects = [randomRect() for _ in range(0, 500)] for r in randomRects: plt.fill( [r.left, r.right, r.right, r.left], [r.bottom, r.bottom, r.top, r.top], 'b-', color = 'k', fill = False ) def markSubrectanglesRed(a, b): if a is not b: if a.isSubrectangle(b): plt.fill( [a.left, a.right, a.right, a.left], [a.bottom, a.bottom, a.top, a.top], 'b-', color = 'r', alpha = 0.4 ) plt.fill( [b.left, b.right, b.right, b.left], [b.bottom, b.bottom, b.top, b.top], 'b-', color = 'b', fill = False ) collideAllVsAll(randomRects, randomRects, markSubrectanglesRed) plt.show() 

The graph shows all the selected rectangles in red, and the rectangles enclosed in blue:

enter image description here

Here is a visualization of bounding rectangles (yellow) and selected dividing lines (blue) of a quasi-two-dimensional partition of space for a small example with one collision:

enter image description here

For 10,000 “reasonable sizes” of random rectangles (with an intersection speed of about the same as in the image), it calculates all collisions in 18 seconds, although the code is very far from optimization.

+3
source

Your problem is spatial proximity, so I would suggest that you want to spatially index your data. This is the repository or indexing of your rectangles so that querying spatial relationships is cheap. See wikipedia for the most common data structures.

I implemented a demo using the R-tree. The whole "algorithm" consists of the following function. This is not particularly elegant since each unique encounter is examined twice. This is mainly due to the limited access and query interfaces that the rtree library rtree .

 import rtree def findCollisions(rects, onCollision): idx = rtree.index.Index(interleaved=False) for rect in rects: idx.insert(rect.id, rect.coords) for rect in rects: ids = idx.intersection(rect.coords) for hit in [randomRects[j] for j in ids]: onCollision(rect, hit) 

The surrounding infrastructure, I shamelessly copied the @AndreyTyukin text with a few changes:

 from random import random def isSubinterval(aStart, aEnd, bStart, bEnd): return aStart >= bStart and aEnd <= bEnd def intersects(aStart, aEnd, bStart, bEnd): return not (aEnd < bStart or aStart > bEnd) class Rectangle: id = 0 def __init__(self, l, r, b, t): self.left = l self.right = r self.bottom = b self.top = t self.id = Rectangle.id Rectangle.id += 1 @property def coords(self): return (self.left, self.right, self.bottom, self.top) def isSubrectangle(self, other): return ( isSubinterval(self.left, self.right, other.left, other.right) and isSubinterval(self.bottom, self.top, other.bottom, other.top) ) def intersects(self, other): return ( intersects(self.left, self.right, other.left, other.right) and intersects(self.bottom, self.top, other.bottom, other.top) ) def __repr__(self): return ("[%f,%f]x[%f,%f]" % (self.left, self.right, self.bottom, self.top)) def randomRect(ratio=0.1, scale=100): w = random() * ratio h = random() * ratio centerX = random() * (1 - w) centerY = random() * (1 - h) return Rectangle( scale*(centerX - w/2), scale*(centerX + w/2), scale*(centerY - h/2), scale*(centerY + h/2), ) 

Comparison with @Andrey's solution gave an improvement of about an order of magnitude. This is most likely due to the fact that python rtree uses a basic C implementation.

+1
source

If the rectangles are distributed evenly enough, you can save time by viewing it as a one-dimensional problem by concentrating first on the X (or Y) axis.

Each rectangle has a minimum and maximum X-coordinate, X-coordinates of the upper left and lower right corners. Create two entries for each rectangle by specifying the minimum or maximum X-coordinate and a pointer to the rectangle. Sort these records in ascending X-coordinate order and execute them in that order.

Maintain an array of rectangles ordered by minimum X-coordinate by inserting a record into it when you see its minimum X coordinate and delete a record from it when you see its maximum X-coordinate. Just before deleting a record, you can perform a binary search in the array to find all records whose minimum X-coordinate is not greater than the minimum X-coordinate of the record you are about to delete, and whose maximum X coordinate is at least that record, which you are about to delete. Check them to see that their Y-coordinates also contain the record you are about to delete. If so, you have found a record that fully contains the record you are about to delete. This should find all the X-constraints, because the array contains entries for each rectangle that overlaps the current X-point in the X dimension - they were inserted and not yet deleted.

(in fact, you need to be more careful than this if there are links for the X-coordinate).

0
source

Source: https://habr.com/ru/post/1276171/


All Articles