Maintaining the smallest subset

Question

Maintaining the smallest subset

Here are the operations that I would like to perform in a hypothetical data structure of a collection that contains many as its elements:

Insert the set into the data structure, but: (1) if the new set is a subset of any of the existing sets, do not add it (2) if the new set is a subset of any existing sets, delete them.
List all collections in the collection.

All considered sets are subsets of a known finite set, for example {0..10 ^ 4}.

Is there any way to do this efficiently?

+4

set algorithm data-structures subset

Jules Jan 24 '12 at 23:24

source share

6 answers

All considered sets are subsets of a known finite set, for example {0..10 ^ 4}.

Let me call it N = 10 ^ 4. It is small enough and it will prove useful. Say you have S.

Logically means that you have an N * S matrix.

You will already have a set of sets. There are S-sets in this primary structure.

10 ^ 4 is small enough so that you can save a secondary data structure that stores for each value N the list of sets in which it is located. This structure is similar to transposing the primary structure. This can be a vector of length N, allowing you to search for a constant search of time to find a list of sets in which a certain value is located.

Now, when you add a new set, you can use this secondary structure to find which other sets are in each of their values. For example, we add a new set with values of 2.5, 10

new_set = {2,5,10}

The secondary structure tells us which sets they are in:

  2 : {A,B,D} 5 : {B,D} 10 : {B}

We can combine and sort these three lists to get ABBBDD , which tells us not only what it imposes on it, but also about the size of the floors. Three nodes are shared with B, which means that our new set is a subset or equal to B. We share 1 node with A and two nodes with D. If it turns out that the total size of A is 1, then we now know that A is a subset our new multitude.

+1

Aaron mcdaid Jan 25 '12 at 1:39

source share

Listing sets in a collection is easy, O (n). However, checking a new candidate to see if he will be a subset of all existing sets will be somewhat expensive. There are well-known algorithms for testing if one set is a subset of the other, so simple

 for each subset s in S for each candidate set C test of C is a subset of s if it is, break if never found, add C to S.

It will be something like O (n ^ 2 log n). Is it considered "effective"?

0

Charlie martin Jan 24 '12 at 23:34

source share

Maintain a flowering filter for all saved sets. Create a color filter for the set you want to insert. If you are a bitwise AND filter for the set to be inserted (call it X) with a different flowering filter and get the value X back, then your set will be inserted. You can be a subset (possibly false positive, you need to check the slow path at this point). Otherwise, this is definitely not the case, and you can try with another.

There are many customizable options when creating a flowering filter that allows you to compromise between space efficiency and the likelihood of false positives.

http://en.wikipedia.org/wiki/Bloom_filter

0

z5h Jan 24 '12 at 23:35

source share

For space efficiency, you can use the bit specified to represent each subset of a known finite set. There are also methods for representing sparse bit sets (see, for example, this Java sample ) to obtain additional space savings.

The general structure may be a set of bit sets. In Java, BitSet does not have a subset testing method, but I do not think it would be too difficult to extend BitSet to include an effective subset testing method. (This avoids the hideous task of checking whether the added candidate is equal to its intersection with any of the existing subsets.)

0

Ted hopp Jan 24 '12 at 23:56

source share

Use some kind of tree structure.

Eg. Save the sorted existing sets in Trie. A flag is supported on each node if the path leading to this node is an existing set

1 To check if a given set is a superset of an existing set:

 def issuperset(node, set[N], setc, N): if node.is_set: return True for j = setc:N if set[j] is a child of node: if issuperset(node.child[set[j]], set, j+1, N): return True return False

2 Delete all supersets of the given set.

 def remsuperset(node, set[N], setc, N): if setc == N+1: remove_all_sets_on_or_below_this_node(node) return for ch in node.child: if ch< set[setc]: remsuperset(node.child[ch], set, setc, N) elif ch == set[setc]: remsuperset(node.child[ch], set, setc+1, N)

3 To enumerate sets, simply cross the tree and the print path - this is_set flag is set to True

0

Elkamina Jan 25 '12 at 0:26

source share

Falk hüffner · Accepted Answer · 2012-01-25T09:36:38+0000

Here is a recent article on this issue: http://research.google.com/pubs/pub36974.html

In short, you cannot do much better than squared time in the worst case; but there are some tricks to speed it up in practice.

Maintaining the smallest subset

More articles: