Union of all intersecting sets

Given a list of objects with multiple attributes, I need to find a list of sets created by joining all intersecting subsets.

In particular, these are Person objects, each with many attributes. I need to create a list of "core" sets based on several unique identifiers such as SSN, DLN, etc.

For example, if Person A and Person B have the same SSN, they create set i. Then, if Person B and C have the same DLN, they create a set ii. Persons D and E have the same SSN, but it (and all other identifiers) does not match any of the identifiers of faces A, B or C. After merging all the intersecting subsets, I get one set with faces A, B, C and another set with faces D, E.

Here is the psuedo code for my solution. I am curious if someone has already come up with a more efficient way to merge all possible intersecting sets. Keep in mind that links between sets can be X Persons long (that is, A corresponds to B by SSN and B corresponds to C by DLN and C corresponds to D by SSN, and D corresponds to E by some other identifier, leading to Persons AE in one set). Also suppose that the language will be implemented in operations with established operations.

bigSetList = array of all of the uniq Sets
fullyTested = false
while (bigSetList.size() > 1) or (fullyTested is false)
    foreach thisSet in bigSetList  order by size desc
        if count(sets that intersect with thisSet) > 0
            newThisSet = thisSet
            intersectingSets = []
            bigSetList.delete(thisSet)
            foreach testSet in bigSetList
                if thisSet.intersects(testSet)
                    newThisSet.addAll(testSet)
                    intersectingSets.push(testSetID)
                end if
            end
            bigSetList.delete(intersectingSets)
            bigSetList.push(newThisSet)
            bigSetList.sort()
            break
        end if
    end foreach
    fullyTested = true  // have looped through every set in the list and found 0 intersect partners
end
+3
source share
5 answers

, , .

, , , . O (N ^ 3) (N ^ 2 N ).

, node; . , .

+4

, Person ( Person, ). Person, Person, . , - . , . Person . , , , , , , .

0

, :

A { ss |-> 42, dl |-> 123 }
B { ss |-> 42, dl |-> 456 }
C { ss |-> 23, dl |-> 456 }
D { ss |-> 89, dl |-> 789 }
E { ss |-> 89, dl |-> 432 }

, , :

1. :

{A} { ss |-> [42], dl |-> [123] }

2. , SSN :

{A,B} { ss |-> [42], dl |-> [123,456] }

3. , DLN :

{A,B,C} { ss |-> [23,42], dl |-> [123,456] }

4. , :

{A,B,C} { ss |-> [23,42], dl |-> [123,456] }
{D}     { ss |-> [89],    dl |-> [789]     }

5. , SSN :

{A,B,C} { ss |-> [23,42], dl |-> [123,456] }
{D,E}   { ss |-> [89],    dl |-> [432,789] }

, ( ) , , .

, n , k , O (nnk) = O (n 2). , . , , (, [23,42]), , .

, O (& alpha; (n)).

, n (, ). , k, , , O (nk & alpha; (n)). k , , O (k 2 & alpha; (n)).

, O (n (nk & alpha; (n) + k 2 & alpha; (n))) = O (n (nk & alpha; (n))) = 0 (n 2 k & alpha; (n)) = O (n 2 & alpha; (n)), k .

& alpha; (n) , O (n 2).

0
while (!people.isEmpty()) {
    Person first = people.get(0);
    people.remove(first);
    Set<Person> set = makeSet(first);
    for (Person person : people) {
        for (Person other : set) {
            if (person.isRelatedTo(other)) {
                set.add(person);
                people.remove(person);
            }
        }
    }
    sets.add(set);
}
for (Set<Person> a : sets) {
    for (Set<Person> b : sets.except(a)) {
        for (Person person : a)
            for (Person other : b) {
                if (person.isRelatedTo(other)) {
                    a.addAll(b);
                    b.clear();
                    sets.remove(b);
                    break;
                }
            }
    }
}
0

-, - , ? , A B SSN, B C DLN, C D SSN, A B SSN, , ?

, , , 57368 ( Google). Union-find. , , , , A-B, A B SSN. . (attribute type, attribute value) = attribute . , object s. , (object, attribute).

One of the important features of the structure of the search data on the Union is that the resulting structure is a collection. It allows you to query "What is A in?" If this is not enough, let us know and we can improve the result.

But the most important feature is that the algorithm has something similar to constant time behavior for each join and query operation.

0
source

Source: https://habr.com/ru/post/1710028/


All Articles