Algorithm for finding a disabled graph from sets

Goal: Want to effectively find all disabled graphs from a large set of sets

For example, I have a data file, for example:

A, B, C
C, D, E
A, F, Z
G, J
...

Each entry is a collection of items. First entries A, B, C = {A, B, C} This also indicates that there is an edge between A and B, A and C, B and C.

I first developed an algorithm

1.parse all the entries into a list:
[
{A,B,C}
{C,D,E}
...
]
2.start with the first element/set of the list can called start_entry, {A,B,C} in this case
3.traverse other element in the list and do the following:
     if the intersection of the element and start_entry is not empty
          start_entry = start_entry union with the element
          remove element from the list
4.with the updated start_entry, traverse the list again until there is not new update

The algorithm above should return a list of vertices of a connected graph. However, I ran into a runtime issue due to the size of the dataset. There are ~ 100,000 entries. So I just wonder if anyone knows if there is a more efficient way to find a related graph.

The data structure can also be changed (if it is easier) A, B BEFORE ERA E, F ... and each record represents the edge of the graph.

+4
3

.

.

Python

from collections import defaultdict

data=["A","B","C"],["C","D","E"],["F","G"]

# Prepare mapping from data element to index
S = {}
for a in data:
    for x in a:
        if x not in S:
            S[x] = len(S)

N = len(S)
rank=[0]*N
parent=range(N)

def Find(x):
    """Find representative of connected component"""
    if  parent[x] != x:
        parent[x] = Find(parent[x])
    return parent[x]

def Union(x,y):
    """Merge sets containing elements x and y"""
    x = Find(x)
    y = Find(y)
    if x == y:
        return
    if rank[x]<rank[y]:
        parent[x] = y
    elif rank[x]>rank[y]:
        parent[y] = x
    else:
        parent[y] = x
        rank[x] += 1

# Merge all sets
for a in data:
    x = a[0]
    for y in a[1:]:
        Union(S[x],S[y])

# Report disconnected graphs
V=defaultdict(list)
for x in S:
    V[Find(S[x])].append(x)

print V.values()

[['A', 'C', 'B', 'E', 'D'], ['G', 'F']]
+5

networkx, , :

import networkx as nx
sets = [{'A','B','C'}, {'C','D','E'}, {'F','G','H'}, ...]

:

G = nx.Graph()
for s in sets:
    l = list(s)
    G.add_edges_from(zip(l, l[1:]))

( " " ):

print(list(nx.connected_components(G)))
# [{'D', 'C', 'E', 'B', 'A'}, {'F', 'H', 'G'}]
+2

Rosetta Code .

, , - , , :

, . , , .

Given N sets of elements, where N> 2, the result will be the same as repeatedly replacing all combinations of two sets of their consolidation until further consolidation between the established pairs is possible. If N <2, then consolidation does not have a strict value, and the input can be returned.

-2
source

Source: https://habr.com/ru/post/1679490/


All Articles