Iterative Connected Component Algorithm

I have a bipartite graph, and I'm looking for the most efficient iterative way to split it into connected components. My recursive version began to overflow the stack with large data sets. I am ready to send from any language / pseudo-code, but for completeness I will code in C #.

My existing code is specialized for my data types. One septum is proteins, the other is spectra. Map and Set are C ++ stdlib workstations.

void recursivelyAssignProteinToCluster (long proteinId, long clusterId, Set<long> spectrumSet, Map<long, Set<long>> spectrumSetByProteinId, Map<long, Set<long>> proteinSetBySpectrumId, Map<long, long> clusterByProteinId) { // try to assign the protein to the current cluster var insertResult = clusterByProteinId.Insert(proteinId, clusterId); if (!insertResult.WasInserted) return; // recursively add all "cousin" proteins to the current cluster foreach (long spectrumId in spectrumSet) foreach (var cousinProteinId in proteinSetBySpectrumId[spectrumId]) { if (proteinId != cousinProteinId) { Set<long> cousinSpectrumSet = spectrumSetByProteinId[cousinProteinId]; recursivelyAssignProteinToCluster(cousinProteinId, clusterId, cousinSpectrumSet, spectrumSetByProteinId, proteinSetBySpectrumId, clusterByProteinId); } } } Map<long, long> calculateProteinClusters (NHibernate.ISession session) { var spectrumSetByProteinId = new Map<long, Set<long>>(); var proteinSetBySpectrumId = new Map<long, Set<long>>(); var query = session.CreateQuery("SELECT pi.Protein.id, psm.Spectrum.id " + GetFilteredQueryString(FromProtein, ProteinToPeptideSpectrumMatch)); foreach (var queryRow in query.List<object[]>()) { long proteinId = (long) queryRow[0]; long spectrumId = (long) queryRow[1]; spectrumSetByProteinId[proteinId].Add(spectrumId); proteinSetBySpectrumId[spectrumId].Add(proteinId); } var clusterByProteinId = new Map<long, long>(); int clusterId = 0; foreach (var pair in spectrumSetByProteinId) { long proteinId = pair.Key; // for each protein without a cluster assignment, make a new cluster if (!clusterByProteinId.Contains(proteinId)) { ++clusterId; recursivelyAssignProteinToCluster(proteinId, clusterId, pair.Value, spectrumSetByProteinId, proteinSetBySpectrumId, clusterByProteinId); } } return clusterByProteinId; } 

As ShinTakezou said, I reorganized to put the stack in a heap, and it works great. I used the DepthFirstSearch method from the digEmAll example.

 var clusterByProteinId = new Map<long, long>(); int clusterId = 0; var clusterStack = new Stack<KeyValuePair<long, Set<long>>>(); foreach (var pair in spectrumSetByProteinId) { long proteinId = pair.Key; if (clusterByProteinId.Contains(proteinId)) continue; // for each protein without a cluster assignment, make a new cluster ++clusterId; clusterStack.Push(new KeyValuePair<long, Set<long>>(proteinId, spectrumSetByProteinId[proteinId])); while (clusterStack.Count > 0) { var kvp = clusterStack.Pop(); // try to assign the protein to the current cluster var insertResult = clusterByProteinId.Insert(kvp.Key, clusterId); if (!insertResult.WasInserted) continue; // add all "cousin" proteins to the current cluster foreach (long spectrumId in kvp.Value) foreach (var cousinProteinId in proteinSetBySpectrumId[spectrumId]) if (!clusterByProteinId.Contains(cousinProteinId)) clusterStack.Push(new KeyValuePair<long, Set<long>>(cousinProteinId, spectrumSetByProteinId[cousinProteinId])); } } 
+6
source share
1 answer

Here is an example of a helper class that contains an undirected graph and allows you to get related components (iteratively):

 public class Graph<T> { public Dictionary<T, HashSet<T>> nodesNeighbors; public IEnumerable<T> Nodes { get { return nodesNeighbors.Keys; } } public Graph() { this.nodesNeighbors = new Dictionary<T, HashSet<T>>(); } public void AddNode(T node) { this.nodesNeighbors.Add(node, new HashSet<T>()); } public void AddNodes(IEnumerable<T> nodes) { foreach (var n in nodes) this.AddNode(n); } public void AddArc(T from, T to) { this.nodesNeighbors[from].Add(to); this.nodesNeighbors[to].Add(from); } public bool ContainsNode(T node) { return this.nodesNeighbors.ContainsKey(node); } public IEnumerable<T> GetNeighbors(T node) { return nodesNeighbors[node]; } public IEnumerable<T> DepthFirstSearch(T nodeStart) { var stack = new Stack<T>(); var visitedNodes = new HashSet<T>(); stack.Push(nodeStart); while (stack.Count > 0) { var curr = stack.Pop(); if (!visitedNodes.Contains(curr)) { visitedNodes.Add(curr); yield return curr; foreach (var next in this.GetNeighbors(curr)) { if (!visitedNodes.Contains(next)) stack.Push(next); } } } } public Graph<T> GetSubGraph(IEnumerable<T> nodes) { Graph<T> g = new Graph<T>(); g.AddNodes(nodes); foreach (var n in g.Nodes.ToList()) { foreach (var neigh in this.GetNeighbors(n)) g.AddArc(n, neigh); } return g; } public IEnumerable<Graph<T>> GetConnectedComponents() { var visitedNodes = new HashSet<T>(); var components = new List<Graph<T>>(); foreach (var node in this.Nodes) { if (!visitedNodes.Contains(node)) { var subGraph = GetSubGraph(this.DepthFirstSearch(node)); components.Add(subGraph); visitedNodes.UnionWith(subGraph.Nodes); } } return components; } } 

Using:

 static void Main(string[] args) { var g = new Graph<long>(); g.AddNodes(new long[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 }); g.AddArc(1, 2); g.AddArc(1, 3); g.AddArc(9, 6); g.AddArc(6, 7); g.AddArc(6, 8); g.AddArc(4, 5); var subGraphs = g.GetConnectedComponents(); } 

You can use the Graph<> class instead of your cards, or if you want to stick to your cards, look at code that is pretty easy to understand ( Dictionary<T,HashSet<T>> used inside the class to store nodes and arcs, therefore it is very similar to your approach)

+5
source

Source: https://habr.com/ru/post/912522/


All Articles