Find if any set is covered by sets of elements.

[Please let me know if this is consistent with a known issue]

I have n sets of different sizes. Each element in the set is unique. And each element can be found in two different sets.

I want to perform an operation on these sets, but avoid duplicating or missing an element. Problem: Find out which of these n sets should be deleted because they are covered by other sets.

eg. [A B C]; [A]; [B]. Delete [a], [b] since both are covered first.

eg. [A B C]; [A]; [B]; [Cd]. Delete [a, b, c], since all three elements are covered by the rest of the sets. Note: here [a], [b] one is invalid because duplicate "c". Similarly, [a], [b], [c, d] the wrong answer, since "d" will be skipped if it is deleted.

+6
source share
4 answers

I think this is a problem with the exact shell . Last limitation: each element has no more than two sets. It does not seem to me that I am fundamentally changing the problem (although I could easily be mistaken in this). The Wikipedia web page contains a good summary of the various algorithmic approaches. The selection algorithm seems to be Dancing Links .

+3
source

I think this is a case of the 2-sat problem, which can be solved in linear time with the help of the Tarjan algorithm .

  • Make an Ai variable for each set i. Ai is true if and only if i needs to be turned on.
  • For each element that appears in one set, add the sentence that Ai = 1
  • For each item that appears in 2 sets i and j, add sentences (Ai && ~ Aj) || (~ Ai & Aj). These positions meant that only one of Ai and Aj should appear.

Now you can solve this problem using the standard 2-sieve algorithm to determine if it is not possible to achieve this or to complete the task, if possible.

For the case with V sets and N elements, you will have V variables and sentences up to 2N, so the Tarjan algorithm will have O (V + 2N) complexity.

+3
source

Since an element in a set can be displayed in no more than two sets, between the sets there are quite simple connections that can be shown as a graph, two examples are shown below. One example uses red lines to represent edges, and another uses black lines to represent edges.

Graph Example

The above shows that sets can be divided into three groups.

  • Sets where all items are displayed twice. These sets can be deleted and / or sets containing these elements can be deleted.
  • Sets when one or more items are displayed twice. Items that appear twice can potentially associate with collections that can be deleted.
  • Sets where items are not displayed twice. These sets can be ignored.

It is not clear what will happen if all the sets are in group 1 or 3. However, there is a fairly simple criterion that allows you to quickly delete sets, and psudocode looks like this:

for each set in group2: for each element that appears twice in that set: if the other set that contains that element is in group1: remove the other set 

Productivity is then linear in the number of elements.

+1
source

I tried to find which settings to include, not delete. Something like that?

(1) List of elements and indexes of their sets
(2) Translate the list of answers with indexes of sets that have elements that appear only in them
(3) Combine the map from (1), and if the index of the set of elements is not listed in the answer list, add the index of the smallest number in which the element is located in the answer.

Haskell Code:

 import Data.List (nub, minimumBy, intersect) sets = [["a","b","c","e"],["a","b","d"],["a","c","e"]] lengths = map length sets --List elements and the indexes of sets they are in mapped = foldr map [] (nub . concat $ sets) where map ab = comb (a,[]) sets 0 : b comb result [] _ = result comb (a,list) (x:xs) index | elem ax = comb (a,index:list) xs (index + 1) | otherwise = comb (a,list) xs (index + 1) --List indexes of sets that have elements that appear only in them haveUnique = map (head . snd) . filter (\(element,list) -> null . drop 1 $ list) $ mapped --Comb the map and if an element set-index is not in the answer list, --add to the answer the index of the smallest set that element is in. answer = foldr comb haveUnique mapped where comb (a,list) b | not . null . intersect list $ b = b | otherwise = minimumBy (\setIndexA setIndexB -> compare (lengths!!setIndexA) (lengths!!setIndexB)) list : b 

OUTPUT:

 *Main> sets [["a","b","c","e"],["a","b","d"],["a","c","e"]] *Main> mapped [("a",[2,1,0]),("b",[1,0]),("c",[2,0]),("e",[2,0]),("d",[1])] *Main> haveUnique [1] *Main> answer [2,1] 
0
source

Source: https://habr.com/ru/post/946239/


All Articles