Random walks in directed graphs / networks

I have a weighted graph with (in practice) up to 50,000 vertices. Given a vertex, I want to randomly select an adjacent vertex based on the relative weights of all adjacent edges.

How to save this graph in memory to make the choice effective? What is the best algorithm? It may be as simple as storing a key value for each vertex, but it may not lend itself to the most efficient algorithm. I will also need to upgrade the network.

Please note that I would like to take only one step at a time.


Formally : taking into account a weighted, directed, and potentially complete graph, let W (a, b) be the weight of the edge a-> b and W a the sum of all edges from a. Given the input vertex v, I want to randomly select a vertex where the probability of choosing a vertex x is W (v, x) / W v

An example :

Say W (v, a) = 2, W (v, b) = 1, W (v, c) = 1.

For a given input v, the function should return a with probability 0.5 and b or c with probability 0.25.

+3
source share
2 answers

If you are concerned about the performance of generating random walks, you can use the alias method to create a data structure that matches your requirements for choosing a random outbound edge Well. The overhead is that you must assign each weighted front a probability weight and the so-called edge of the alias.

So, for each note, you have a vector of outgoing edges along with the weight and edge of the alias. Then you can select random edges in constant time (only the edata structure generation is linear time relative to the number of full edges or the number of node edges). In the example, the edge is indicated by the symbol ->[NODE] , and node v corresponds to the above example:

 Node v ->a (p=1, alias= ...) ->b (p=3/4, alias= ->a) ->c (p=3/4, alias= ->a) Node a ->c (p=1/2, alias= ->b) ->b (p=1, alias= ...) ... 

If you want to select an outgoing edge (i.e. the next node), you just need to create one random number r uniform from the interval [0,1).

Then you get no=floor(N[v] * r) and pv=frac(N[v] * r) , where N[v] is the number of outgoing edges. That is, you select each edge with the same probability (namely 1/3 in the node v example).

Then you compare the assigned probability p this edge with the generated pv value. If pv less, you save the previously selected edge, otherwise you select its edge of the alias.

If, for example, from our random number generator r=0.6 we have

 no = floor(0.6*3) = 1 pv = frac(0.6*3) = 0.8 

Therefore, we select the second outgoing edge (note that the index starts from zero), which is equal to

 ->b (p=3/4, alias= ->a) 

and switch to the edge of the alias ->a , since p=3/4 < pv .

For node v example, we therefore

  • select edge b with probability pv<3/4 (i.e. when no=1 and pv<3/4 )
  • select edge c with probability pv<3/4 (i.e. when no=2 and pv<3/4 )
  • select edge a with probability 1/3 + 1/3*1/4 + 1/3*1/4 (i.e. when no=0 or pv>=3/4 )
+7
source

In theory, the absolutely most effective task is to save for each node the moral equivalent of a balanced binary tree (red-black or BTree or a skip list of all connected nodes) and their weight and total weight on each side. Then you can select a random number from 0 to 1, multiply by the total weight of the connected nodes, then perform a binary search to find it.

However, traversing a binary tree like this includes many options that tend to create pipelines. It is very expensive. Therefore, in practice, if you program in an effective language (for example, in C ++), if you have less than two hundred connected edges on a node, a linear list of edges (with a pre-calculated sum) that you walk in the loop may turn out to be faster.

+1
source

Source: https://habr.com/ru/post/905338/


All Articles