Algorithm for defining “fuzzy related” subgraphs

Question

Algorithm for defining “fuzzy related” subgraphs

I have a problem that looks like a connectet subgraph problem from a mile high, but is completely different in that it does not fall under strict definitions.

I came across a graph with several million nodes and links (manual analysis is not possible), among these millions of nodes, as you know, there are 2 or 3 “sets”.

Each of the “sets” consists of hundreds of thousands of units of nodes and tens of thousandths of subgraphs that are not strongly interconnected. Each of these sets theoretically should not be associated with other sets ... but there are (guesstimation) dozens of erroneous links that ultimately connect these sets.

The problem is to find these sets and erroneous links, or at least get a list of possible erroneous links that can be checked manually.

My current “best idea” is to randomly select two nodes, find the shortest path between them, and then mark the links on this shortest path. Rinse and repeat millions of times, and erroneous links will ultimately turn out to be most noticeable, as they are the “chokepoints” between sets.

However, it is rather slow, and when one set is much larger than the others and has internal chokepoints, it becomes dominant in the list of "most noticeable", making it meaningless.

Are there better algorithms / approaches for this?

edit: refinement of the marking of the path is to label in proportion to the length of the path, which helps with the problem of “internal chokepoints of a large set”, but does not completely eliminate it, as some sets may have remote “outliers”, while other sets have many closely connected nodes (short internal distances)

+5

algorithm graph

Eric grange Dec 31 '15 at 9:05

source share

1 answer

Adam stelmaszczyk · Answer 1 · 2014-12-31T09:41:45+0000

My idea is the ant colony algorithm . I am inspired by your approach to choosing two random nodes, but I thought it would be useful to do something more, and not just calculate the shortest path.

Run n ants at n random nodes. You will need to configure n by trial and error. Ants leave pheromone at the edges passed. Pheromone evaporates in time. Ants choose one of the various ribs to move according to probability. The more pheromones an edge has, the more likely it is that ant will select that edge.

At the beginning, ants move completely randomly, since pheromones and ribs do not have the same probability. However, over time, the most popular ribs, bridges between two "fuzzy connected" components will have more and more pheromones on them.

So, you throw n ants, simulate for m turns and return the edges with the most pheromones on them. You can visualize this process to clearly see what is happening.

Update: I realized that the sentence "However, over time, the most popular ribs, bridges between two" fuzzy connected "components will have more and more pheromones on them." I implemented it, and it looks like most time bridges do not necessarily attract ants:

There were n = 1000 ants and m = 1000 steps. Initially, each rib had 1 unit of pheromone, and it increased by 1 if ant traveled along it. No evaporation, however, I think this will not improve the situation. The bridge had 49,845 pheromone units, but there were three more edges that had more than 100 thousand.

As suggested by Peter de Rivaz in a comment, I tried ( source code ) by repeating min-cut between two random nodes, and this is much better:

_{Graphs created using the python-igraph library .}

Algorithm for defining “fuzzy related” subgraphs

More articles: