Optimization of vectorized code for graph adjacency

I am writing machine learning code in MATLAB, and I am presenting a graph with adjacency matrix A and clustering a graph with matrix Z defined in the following ways.

A: a_ij is 1 if there is an edge between node i and node j. 0 otherwise. Z: z_ij is 1 if node j is in cluster i. 0 otherwise.

I compute the matrix N, which is the number of edges between clusters, defined as follows:

N: n_ij is the number of edges between nodes in cluster i and nodes in cluster j. n_ii is the number of edges within cluster i.

N can be calculated:

N = zAz' 

where z is z-transposed.

If I have many nodes, then the calculation takes some time, but this is not a problem. The problem is that I move nodes from cluster to cluster many times, and every time I want to calculate N.

So, the problem is this: given that I know N, and then I move node I from cluster c_1 to cluster c_2, how can I effectively update N?

+6
source share
1 answer

To go from Z to Z + U, upgrade

N? N + Z (AU ') + (UA) Z' + UAU '
Z? Z + U.

Z and U (and A, if that makes sense for your schedule) should have sparse representations. Theoretically, at least this makes the more or less compiled code what I would do in C: scan neighbors i, reducing the number of samples to and from the old cluster and increasing the number of edges to and from the new cluster. In practice, you may need to transpose Z to force it to conform to the correct Matlab matrix representation and update Z & larr; Z + U, replacing two whole lines so that newly null entries are not treated as non-zero.

+2
source

Source: https://habr.com/ru/post/909899/


All Articles