Fast algorithm for finding all points inside a rectangle

Given a set of different points in 2D space and a rectangle (the coordinates of all four points, sides parallel to the xy axis), how can I quickly find which points are inside the rectangle?

I am not interested in the basic solution that goes through all the points and seeing which one is inside the rectangle. I am looking for an algorithm that will give me a quick query time for each rectangle.

I don't care how much time I spend in the preprocessing phase. What worries me is that after processing the data, I get a useful structure that gives me a quick query time for each rectangle.

I know, for example, I can calculate how many points I have inside a rectangle in O (logN). This works because I do a lot of heavy processing at the beginning, and then each time I process the processed data using a new rectangle and get a new score at logN time. I am looking for a similar algorithm to find the actual points, not just their score.

+5
source share
6 answers

The classic answer is a kD tree (a 2D tree in this case).

For a simple alternative, if your points are evenly distributed, you can try using a grid.

Select the cell size for the square grid (if the problem is anisotropic, use a rectangular grid). Assign each point to the grid cell that contains it saved in the linked list. When you run the query, find all the cells that overlap with the rectangle and look at them to cross their lists. For partially covered cells, you will need to perform a point-in-rectangle test.

The choice of size is important: too large can lead to too many points that need to be tested in any case; too small can lead to too many empty cells.

+7
source

If you sort the points along the X axis, then you must do a binary search to find the leftmost (smallest X value) that is in the rectangle.

Do another binary search to find the largest in the rectangle (the largest value of X).

All points in the sorted list between these two points will be within the left edge of the rectangle, even if you have not checked most of them!

Repeat the same process along the vertical / y-axis.


Two binary searches on the X axis should be equal to O (logN).
Same thing with two binary searches along the y axis.
O (4 logN) == O (log N)

In the end, there will still be some kind of merger, I'm not immediately sure.

+5
source

You are looking for a range of kd trees or a range query.

  • Quadtrices (or octtrees or 16-trees ...) are better when there is a changing set of points, but you expect the distribution to be uniform. No additional balancing steps are required since the tree structure is fixed.
  • kd-trees work better on a fixed set of points, even with uneven distribution. When the set of points changes, it is difficult (but not impossible) to complete the self-balancing steps.
  • AABB trees (or bold AABB trees) provide a quick way to test overlapping shapes, not just points. Sometimes AABB trees need to be balanced. When constantly moving objects are turned on, the general way is to use "live AABB-s", so you do not need to update the tree in every frame.
  • Sorting with only one axis and using binary search (something like abelenky suggested, but I think it makes no sense to do a second binary search) is a simple and elegant solution, but it will be slow when, for example, you sort by X, and that's it the points are on a line parallel to Y. You must perform linear filtering on points that are matched by a binary search on X. Time complexity is the worst case of O(n) , but this worst case happens quite often.

All these algorithms run queries on average O(log n + k) , where k is the number of matching points.

Gridding, as Yves suggested, can do a range search in O(k) time, but only when the size of the request rectangle is limited. This is what they often do in particle modeling . Gridding can be used even if the input data set is not limited - just make a fixed number of buckets based on the hash of the grid coordinates. But if the request rectangle can be of any size, then the grid is not-go.

+2
source

You can group points in sectors. If the sector is completely in or out of the given rectangle, then all points inside it are also located or exit. If the sector is partially included, you need to look for O (n) for points in this sector to check if they are in the rectangle. Find a kd tree search.

+1
source

Along with other answers, you can also see Morton codes (sorting a z-order curve).

In your case, this is static data, you can even represent all the point data as an array.

https://en.wikipedia.org/wiki/Z-order_curve

This article also has a rather complicated timeline for various "multidimensional access methods" - http://www.cc.gatech.edu/computing/Database/readinggroup/articles/p170-gaede.pdf

+1
source

I think you should save your points in quadtree . I have not developed the details, but it should allow you to basically do something similar to binary search, which directly gives points inside the rectangle.

If your points are clustered, that is, there are clusters that contain many points in a small area and other areas that do not contain or very few R-tree points can be even better.

The execution complexity should be O (logN), I think.

0
source

Source: https://habr.com/ru/post/1238146/


All Articles