The best algorithm for comparing a set with a set of sets

What is the best algorithm for finding sets in a finite set of sets that are a subset of a specific set?

For example, if

A = {1, 2} B = {2, 3, 4} C = {3, 5} D = {6} 

and X = {1, 2, 3, 5}

Then A and C are subsets of X.

Is there an algorithm I could do in linear time complexity?

Implementation note: Members of sets usually have a very limited range, so it would be nice to use the C ++ bit set to implement the algorithm. Can not?

Edit: The number of sets in a collection is usually much larger than the number of elements in X (in the example). Is there any way to make this linear in the number of elements in X? Perhaps using a hash or something else?

+4
source share
2 answers

Assume for a moment 64 possible elements.

Then, if you represent each element as a bit, you can use an integer of 64 bits to represent each set, and then: a & b is set the intersection of a and b . If (and only if) a is a subset of b , then a & b == a .

Of course, you can use bitset if you need more than 64 bits.

For a large range of elements, using a hash table to store (once) the add-in, and then iterate through potential subsets to see if all the elements in it can be.
It is linear in input size (average case).


EDIT: (answer to editable question)

If you have not previously saved some information about the data - this cannot be done by betetr, then O(|X| + n*min{m,|X|}) Where | X | is the size of the set X, n is the number of sets, and m is the average size of the sets. The reason is that this happens in the worst case, you need to read all the elements in all the sets (because the last element that you read for each set decides whether it is a subset or not), and therefore we cannot achieve better without previous knowledge of sets.

Suggested solutions:
Bitset: O(|X|*n)
Hash solution: O(|X| + min{m,|X|}*n) (middle case)

Although a hash solution provides better asymptotic complexity, constants are much better for bit set, and therefore a bit rate solution is likely to be faster for small |X|

+7
source

If you are not limited in time to create some additional structures, then the O (log (n)) solution will be to store bit sequences that are sets of faces in Trie .

You do not need to compare a set (aka bitstring) with all other sets, as Amit suggests. If you have a sorted collection of bit strings, then each comparison obviously reduces the number of options in half. Yes, of course, the time to build the trie bitter is something like O (n * log (n)), but this is preprocessing.

+1
source

Source: https://habr.com/ru/post/1435849/


All Articles