Pathfinding when forcing unique node attributes - which algorithm should I use?

Update 2011-12-28: here is a blog post with a less vague description of the problem I was trying to solve, my work on it and my current solution: Monitoring each MLB team Play a game


I am trying to solve some strange task of finding a way. I have an acyclic radiation pattern, and each edge has a distance value. And I want to find the shortest path. Simple, right? Well, there are several reasons why I cannot just use Dijkstra or *.

  • I don't care what the starting node of my path means, and not the end of the node. I just need a path that includes exactly 10 nodes. But:
  • Each node has an attribute, let it say color. Each node has one of 20 different possible colors.
  • The path I'm trying to find is the shortest path with the exact 10 nodes, where each node is a different color. I do not want any of the nodes in my path to have the same color as any other node.
  • It would be nice to be able to make my path have one value for one of the attributes (for example, at least one node must be blue "), but this is not necessary.

This is a simplified example. My complete dataset actually has three different attributes for each node, which must be unique, and I have 2k + nodes, each of which has an average of 35 outgoing edges. Since obtaining the ideal “shortest path” can be exponential or factorial, an exhaustive search is really not an option. What I'm really looking for is the “good path” approach, which meets the criterion under No. 3.

Can someone point me to an algorithm that I could use (even a modified one)?


Some statistics about my complete dataset:

  • Total Nodes: 2430
  • Total Ribs: 86524
  • Nodes without incoming edges: 19
  • Nodes without outgoing edges: 32
  • Most outgoing ribs: 42
  • Middle edges on node: 35.6 (in each direction)
  • Due to the nature of the data, I know that the graph is acyclic
  • And in the full dataset, I'm looking for a path length of 15, not 10
+6
source share
6 answers

This is the case when in fact the question contains most of the answer.

Do a breadth-first search starting from all root nodes. When the number of parallel search paths exceeds a certain limit, drop the longest paths. The length of the path can be weighted: the last edges can have a weight of 10, the edges covered 9 hops back - weight 1. You can also assign a smaller weight for all paths that have a preferred attribute or paths that pass through loosely connected nodes. Store the last 10 nodes in the hash table path to avoid duplication. And save somewhere the minimum sum of the last nine lengths of the ribs along with the shortest path.

+1
source

If the number of possible values ​​is small, you can use the Floyd algorithm with a slight modification: for each path you store a bitmap that represents the different values ​​that have already been visited. (In your case, the bitmap will be 20 bits wide per path.

Then, when you perform a length comparison, you also AND your bitmaps to check if it is a valid path, and if so, you OR them together and save them as a new bitmap for the path.

+1
source

Have you tried the straightforward approach and failed? From your description of the problem, I don’t see the reason for the simple greedy algorithm, for example, searching for depth at the beginning can be fine:

  • Select the beginning of the node.
  • Check the nearest neighbors, are there any nodes that can be added to the path? Expand the path with one of them and repeat the process for this node.
  • If you fail, return to the last successful state and try a new neighbor.
  • If you checked the neighbors for verification, this node may not be the beginning of the node path. Try a new one.
  • If you have 10 nodes, you're done.

It is difficult to give a good heuristic for selecting the beginning of a node without any knowledge of the distribution of attributes, but it is possible that this is useful for nodes with a high degree of degree.

0
source

It seems that the most greedy depth search will be your best bet. Given a reasonable distribution of attribute values, I think that finding the only valid sequence is the time E [O (1)], which is expected to be a constant time. I could prove it, but it may take some time. The proof would use the assumption that there is a nonzero probability that at each step a valid next segment of the sequence can be found.

A greedy search will retreat when the unique attribute constraint is violated. The search stops when a 15-segment path is found. If we take our guess that each sequence can be found in E [O (1)], then it is a matter of determining how many parallel searches are required.

0
source

For those who want to experiment, here is a (postgres) sql script to generate some fake data.

SET search_path='tmp'; -- DROP TABLE nodes CASCADE; CREATE TABLE nodes ( num INTEGER NOT NULL PRIMARY KEY , color INTEGER -- Redundant fields to flag {begin,end} of paths , is_root boolean DEFAULT false , is_terminal boolean DEFAULT false ); -- DROP TABLE edges CASCADE; CREATE TABLE edges ( numfrom INTEGER NOT NULL REFERENCES nodes(num) , numto INTEGER NOT NULL REFERENCES nodes(num) , cost INTEGER NOT NULL DEFAULT 0 ); -- Generate some nodes, set color randomly INSERT INTO nodes (num) SELECT n FROM generate_series(1,2430) n WHERE 1=1 ; UPDATE nodes SET COLOR= 1+TRUNC(20*random() ); -- (partial) cartesian product nodes*nodes. The ordering guarantees a DAG. INSERT INTO edges(numfrom,numto,cost) SELECT n1.num ,n2.num, 0 FROM nodes n1 ,nodes n2 WHERE n1.num < n2.num AND random() < 0.029 ; UPDATE edges SET cost = 1+ 1000 * random(); ALTER TABLE edges ADD PRIMARY KEY (numfrom,numto) ; ALTER TABLE edges ADD UNIQUE (numto,numfrom) ; UPDATE nodes no SET is_root = true WHERE NOT EXISTS ( SELECT * FROM edges ed WHERE ed.numfrom = no.num ); UPDATE nodes no SET is_terminal = true WHERE NOT EXISTS ( SELECT * FROM edges ed WHERE ed.numto = no.num ); SELECT COUNT(*) AS nnode FROM nodes; SELECT COUNT(*) AS nedge FROM edges; SELECT color, COUNT(*) AS cnt FROM nodes GROUP BY color ORDER BY color; SELECT COUNT(*) AS nterm FROM nodes no WHERE is_terminal = true; SELECT COUNT(*) AS nroot FROM nodes no WHERE is_root = true; WITH zzz AS ( SELECT numto, COUNT(*) AS fanin FROM edges GROUP BY numto ) SELECT zzz.fanin , COUNT(*) AS cnt FROM zzz GROUP BY zzz.fanin ORDER BY zzz.fanin ; WITH zzz AS ( SELECT numfrom, COUNT(*) AS fanout FROM edges GROUP BY numfrom ) SELECT zzz.fanout , COUNT(*) AS cnt FROM zzz GROUP BY zzz.fanout ORDER BY zzz.fanout ; COPY nodes(num,color,is_root,is_terminal) TO '/tmp/nodes.dmp'; COPY edges(numfrom,numto, cost) TO '/tmp/edges.dmp'; 
0
source

The problem can be solved by dynamic programming as follows. Let the formal determination of his decision begin.

Given DAG G = (V, E) , let C be the set of vertices visited so far, and w[i, j] and c[i] weight (distance) associated with the edge (i, j) , respectively, and color vertex i . Note that w[i, j] is zero if the edge (i, j) does not belong to E Now we define the distance d for the transition from vertex i to vertex j taking C into account as

 d[i, j, C] = w[i, j] if i is not equal to j and c[j] does not belong to C = 0 if i = j = infinite if i is not equal to j and c[j] belongs to C 

Now we are ready to define our subtasks as follows:

A[i, j, k, C] = the shortest path from i to j that uses exactly k edges and respects the colors in C , so that two vertices in the path are not colored using the same color (one of the colors in C )

Let m be the maximum number of edges allowed in the path, and suppose that the vertices are labeled 1, 2, ..., n . Let P[i,j,k] be the previous vertex j in the shortest path satisfying the restrictions from i to j . The following algorithm solves the problem.

 for k = 1 to m for i = 1 to n for j = 1 to n A[i,j,k,C] = min over x belonging to V {d[i,x,C] + A[x,j,k-1,C union c[x]]} P[i,j,k] = the vertex x that minimized A[i,j,k,C] in the previous statement 

Set the initial conditions as follows:

 A[i,j,k,C] = 0 for k = 0 A[i,j,k,C] = 0 if i is equal to j A[i,j,k,C] = infinite in all of the other cases 

The total computational complexity of the O(mn^3) algorithm; taking into account that in your particular case m = 14 (since you want exactly 15 nodes), it follows that m = O(1) , so the complexity is actually O(n^3) . To represent set C use a hash table so that nesting and membership testing requires O (1) on average. Note that in the algorithm, the operation C union c [x] is actually an insert operation in which you add the color of the vertex x to the hash table for C. However, since you insert only one element, the union operation set produces exactly the same result (if the color is not in the set, it is added, otherwise it is simply discarded and the set does not change). Finally, to represent a DAG, use the adjacency matrix.

Once the algorithm is executed to find the minimum shortest path among all possible vertices i and j , just find the minimum among the values ​​of A[i,j,m,C] . Note that if this value is infinite, then there is no valid shortest path. If there is a real shortest path, you can determine it using the values ​​of P[i,j,k] and tracing back through the vertices of the predecessor. For example, starting with a = P[i,j,m] last edge on the shortest path (a,j) , the previous edge is given by b = P[i,a,m-1] , and its value is (b,a) and etc.

0
source

Source: https://habr.com/ru/post/903477/


All Articles