(I expanded this answer to describe another, and hopefully a much faster algorithm. As a result, some comments are a bit outdated.)
First I will try to clarify the problem:
We have a big grill. 100,000,000 times 100,000,000 nodes. This is a lot of nodes. A small number N of these nodes are given nodes. We have to find the loop in the lattice. The loop should not visit each node in the lattice, but it should visit each of these nodes exactly once. And he must visit these given nodes in that order.
Decision:
There are N shortest paths we need to find. We must find:
- The shortest path from
g_1 to g_2 , avoiding all other given nodes - The shortest path from
g_2 to g_3 , avoiding all other given nodes - ...
- The shortest path from
g_{N-1} to g_N , avoiding all other given nodes - The shortest path from
g_N to g_1 , avoiding all other given nodes
These problems are independent of each other. I will try to describe a fast algorithm later, but first a naive algorithm to search for a path from a given source to a given target node:
- Start with the whole grid ...
- For each of the N given nodes , except the current source node and target node , remove 4 edges above, below, left, and right.
- Use the standard shortest path algorithm to find the shortest path in the graph between the current source and the current target.
- replace the edges before searching for the next shortest path.
This is not a fast algorithm, but I hope it is correct. It will be much slower than it could be. We can improve the standard BFS shortest path algorithm, because we are not dealing with any arbitrary graph, we are dealing with a lattice.
Fast algorithm
(I think this is the standard A * search algorithm. Thanks to my colleagues at the university for this.)
Assuming you're still with me, the goal is to find the shortest path between two nodes on the grid, where the (small) number of nodes is unsuitable.
Imagine that each node used in the lattice has two variables associated with it, an “upper bound” and a “lower bound”. These boundaries describe the upper and lower boundaries of the distance to the source node. We initialize the lower bound with the “optimistic” distance in Manhattan to the source of the node (“optimistic” that we allow all nodes to be used). We initialize the upper bound infinite for all nodes except the source node, which is assigned the upper bound 0. For a single node, as soon as its upper bound converges to the lower boundary, then this is the correct distance to the source node. The algorithm continues, gradually increasing the lower bounds and decreasing the upper bounds, until the boundaries in the target node come closer.
At the beginning, the estimates already converged for the source node. The distance from the source to itself is 0.
(Obviously, it is not practical to store all these pairs of numbers for all nodes, but we will answer later).
How to update borders? For any "current" node, the upper bounds of its neighbors should be no more than 1 greater than the upper bound of the current node. This allows the node to "reset" the upper bounds of its neighbors.
In addition, if there is a large discrepancy in the lower boundaries of two adjacent nodes, they can influence each other. If node A has a lower bound of 5 and it is associated with node B, which has a lower bound of 3, then we can increase the lower bound of B to 4. Proof from the contrary: if B could be reached in just 3 steps, then it is clear that A can be reached by no more than 4 steps, passing through B. But this contradicts our knowledge that the lower boundary at point A is 5. Therefore, the lower boundary at B must be increased to 4.
This shows how nodes can pull or compress the boundaries of their neighbors. We must turn this idea into an algorithm.
We need to maintain the "border" of nodes. This boundary is a set of nodes that can affect the boundaries of their neighbors. Initially, a border is a singleton set containing the source node.
Scroll the following until the borders come close to the target node:
- If the border is now empty, the algorithm failed. The source node falls into the trap and cannot reach the target.
- Select the node boundary that is closest ("like a crow") to the node target. (You can select any node border, but this is an optimistic heuristic). Call it the "current" node.
- Check if this “current” node can influence the borders of its neighbors. (Obviously ignore unsuitable nodes).
- If he could not influence the borders of his neighbors, remove the current node flag from the border and return to step 1. (Note: this node can re-enter the border later).
- Go to change the borders of the neighbors. For each neighboring node that is affected, bring it to the border (if it was not already).
- If we just saw that the borders converge in the target node, then the algorithm is complete.
- Now that we’ve examined the neighbors, we’ll check to see if the estimates already converge in the current node. If so, we can (permanently) remove the current node from the border. We know the correct distance for this node and tried our best to include this information in the borders of our neighbors. We can no longer handle this.
Finally, we must develop an appropriate data structure to store these boundaries. It is impossible to explicitly save it for all points of this huge lattice. I would use a map from C ++
// mapping from (x,y) to (upper_bound, lower_bound) map< pair<int,int>, pair<int,int> > bounds;
Initially, this map will only contain the source node: (source_x, source_y) => (0,0) When there is an attempt to request this mapping for a node that is not currently present, we can simply initialize it, as described above, with an optimistic estimate Manhattan for the lower bound and infinity ( INT_MAX ) for the upper bound. Remember that the algorithm will only query this for nodes that are neighbors of border nodes, so in many cases it should not grow too much. In addition, if a node converges, and all its neighbors converge (or are unusable), then we can permanently remove it from the map.