Find Islands in Oriented Graph

I work in an obscure language with poor dependency management. To help get 14,000 file codes, I wrote several parsing tools (in Java) and created a dependency graph.

I wrote my own schedule and BFS classes and they work fine. With them, I have methods like getParents() and getChildren() .

Now I am trying to find the "islands" in this graph; that is, I'm trying to find which sections of our code base are independent of each other, hoping to assemble them into isolated modules.

Later I also plan to analyze individual islands to see if there are any weaknesses where we can install a modular barrier and define this module interface, but this is on the way.

Now I think about it:

 Map<DependencyEntry, Set<DependencyEntry>> allChildren = new ...; for(DependencyEntry entry : allFiles) allChildren.put(entry,getAllChildren(entry)); Set<DependencyEntry> visited = new ...; Set<DependencyEntry> largest = new HashSet<DependencyEntry>(); // size 0 // slightly more expensive but more readable for(DependencyEntry entry : allChildren.keySet()) { Set<DependencyEntry> set = allChildren.get(key); if(set.size() > largest.size()) largest = set; } visited.addAll(largest); 

That should give me the biggest island. From there, I can go through and exclude any sets that contain any visited nodes, and then run it again to get the next largest island, etc.

Is this an accurate algorithm? Is there a better way to solve this problem that I do not see?

+6
source share
2 answers

It looks like you want to compile a collection of connected components found in the dependency graph. From there, you can iterate over the collection and identify the largest component (or any other interesting metric).

+2
source

it would be easier to use a graph library because such things are built-in. usually they allow you to store arbitrary information in nodes, edges, so you can use your own classes for data, but the library provides support and standard algorithms.

The algorithm you describe seems a little unclear what the root nodes are. if you do not start from the root, then you will not find the "whole island" - only the parts below (children) where you started. you can fix this by following the parents as well as with the children. besides, it sounds normal - it happens, as PaulF says, as far as I can tell, and find related components.

see also Good Java Graph Algorithm Library?

+1
source

Source: https://habr.com/ru/post/895899/


All Articles