An efficient algorithm for calculating areas on the geographical map with the highest spot density

Say I have a geographical map where the points are represented by latitude / longitude. I have several points on this map, and points can be added \ deleted \ moved at any time.

I need to get the “hottest points” - the areas that contain the most points separated by the area - or, in other words, the areas with the highest point density.

I need an effective data structure, as well as an algorithm that recounts the hottest points for any changes. The complexity of the calculations and the complexity of the memory should be minimal, since the number of points can be very high.

I want to know and maintain a list of the hottest places in descending order - the thickest area first, and then fewer crowded areas. It’s normal to have a list of limited size — for example, the 100 hottest spots.

Of course, to prevent 100% density at one isolated point, there is a minimum area (defined as a constant).

The definition of "area" here is any perceived area on the map containing points. It may be the whole map, but the algorithm should not see this as a hot spot, of course =)

Thanks! If you need any clarification, say so ...

+4
source share
3 answers

What you do is statistically known as a “2-dimensional density estimate” (so you know where to look).

One common technique is kernel smoothing. Imagine a disk sitting on each of your data points. Smoothing a kernel by area is the number of disks overlapping at each point. This is smoothing the core using a homogeneous core with a fixed radius.

You do not need to use the same cores, and they should not be the same size at all points - this is now "adaptive bandwidth smoothing".

This gives you a grid that you can update quite easily, especially if you have a finite kernel (you can use the Gaussian (aka Normal) kernel, which is infinite, but will be tied to your area of ​​study). Each time you add or remove a point, you add or remove its contribution to the density of the nucleus.

So half the problem - now you have a density grid. You can then use clustering algorithms to separate it and find individual peaks according to any criteria a) defines a “peak” and b) defines it as separate from an adjacent peak. Someone else suggested clustering algorithms. I did this (ten years ago) using the clustering functions in the statistical package “R”. Speed ​​is not her forte, though ...

You might want to refer this to http://stats.stackexchange.com

+5
source

This is too long for comment, but here is just the idea of ​​how I will “play” with it and see if I can come up with something interesting. But one thing is certain: it can be done very quickly.

Could this be easily translated into some discrete problem? First, you align all your coordinates to a large map (you determine how big each square is, and you make each map enter one such point). Then you would get something like this:

0000000000000000000000000000 00XX000000000000000000X00000 00X00000000000000X0000000000 0000000000000000000000000000 0000000000000000000000000000 000000X00000000X000000000000 0000000000000000000000000000 000000000000X000000000X00000 00000000000000000000000X0000 0000000000000000000000000000 

Then you calculate each record and the number of neighboring neighbors:

 0000000000000000000000000000 0033000000000000000001000000 0030000000000000010000000000 0000000000000000000000000000 0000000000000000000000000000 0000001000001001000000000000 0000000000000000000000000000 0000000000001010000000200000 0000000000000000000000020000 0000000000000100000000000000 

Then you can increase the size of your square, say, by two and, therefore, divide the map:

(the card is incorrect, this is to give an idea of ​​what I'm thinking)

 09001001000000 00000000000000 00100001100000 00000110002000 00000002000000 00000100000000 

Then you re-read neighboring neighbors, etc.

For me, this would allow me to find an access point depending on your “permission”: you would just look for the largest numbers, and these would be your “hot spots”.

Because in this case:

 0000X00000 0000XX0000 0000000000 0000000000 0Y0Y000000 0000000000 0Y0Y000000 

Either “X” can be the hottest spot (three interesting points that are close to each other), or “Y” (four points are close to each other, but they are not as close as for “X”).

Since you said you need speed, I will just turn this into a discrete problem and present my graphs as arrays. Then I would allow a variable size of "area".

Sounds like a fun problem to work with :)

+2
source

The type of algorithm used will depend on the distribution of points. You may well be faced with a much more complex problem of dividing groups into "areas". Since you seem to need something to get you started, I recommend that you read the convex hulls and calculate the area of ​​the two-dimensional abritray polygon and how to determine if a point is in the polygon .

+1
source

Source: https://habr.com/ru/post/1343004/


All Articles