Simplified installation

Say I have two sets, set1 = {a,b,c,d,e,f} and set2 = {a,b,c,d,e,g} . Instead of expressing it explicitly, I want to create something like

 common = {a,b,c,d,e} set1 = common + f set2 = common + g 

If we wanted to represent {a,b,c,h} , we could represent it as common - d - e + h .

My goal is mainly to be able to generate the optimal common part to be used. With one general section, this is not too complicated, but I need to allow more than one (but not unlimited), or the benefits will be trivial.

By optimal, I mean "least expressed elements." Thus, in the above example, it "costs" 5 (the number of elements) to create a common variable. Then it sets 1 and 2 as the cost of 2 (one for the link is common, one for adding an additional item), the total amount is 7. Without substitution, they will cost 12 for storage (6 items each). Similarly, subtracting an element from a link will be "worth" 1.

Another example is {a,b,c,d}, {a,c,d,e}, {e,f,g,h} and {e,f}

may be

 common1 = {a,c,d} common2 = {e,f,g} set1 = common1 + b set2 = common1 + e set3 = common2 + h set4 = common2 - g 

By providing several common parts, this becomes much more complex. Is there a name for this type of problem or something similar? It looks like this might be due to compression, but I could not find too many resources, where to start.

Some other details that may be relevant:

  • Allowed to refer to several common parts to represent one set, may be valid, but not required.
  • For my use case, sets will usually contain about 20 elements and about 10 different sets.
+5
source share
1 answer

You can find all atomic sets, that is, all sets that are never visible separately.

 {a,b,c,d,e,f,g,h} | {a,b,c,d} = {a,b,c,d},{e,f,g,h} {a,b,c,d},{e,f,g,h} | {a,c,d,e} = {a,b,c,d},{e,f,g,h} {a,c,d},{b},{e},{f,g,h} | {e,f,g,h} = {a,c,d},{b},{e},{f,g,h} {a,c,d},{b},{e},{f,g,h} | {e,f} = {a,c,d},{b},{e},{f},{g,h} {a,b,c,d} = {a,c,d},{b} {a,c,d,e} = {a,c,d},{e} {e,f,g,h} = {e},{f},{g,h} {e,f} = {e},{f} 

This is a little closer, but it does not solve the minimum breakdown.

I don’t think you can find the minimum, because I suspect it is NP-Hard. If you look at the set S and create a graph where each possible subset of S is node G. Now give the weight of the node according to the length of the subset and draw an edge between each node that corresponds to the number of changes. {abc} β†’ {a} has a weight of 2. {bcd} β†’ {abe} has a weight of 4. Now, to find the minimum solution to the general set problem, you need to find the minimum spanning tree that spans each of the sets you are interested in. If you find that you can use this to create a minimal common set - that would be the same. Finding the minimum weight in a weighted graph of a node is called the Node-Weighed Steiner tree problem. A node weighted Steiner tree problem may be equivalent to the Steiner tree problem. Steiner Tree's problem may seem NP-Hard. Therefore, I strongly suspect that the problem you are trying to solve is NP-Hard.

http://theory.cs.uni-bonn.de/info5/steinerkompendium/node15.html

http://theory.cs.uni-bonn.de/info5/steinerkompendium/node17.html

+2
source

Source: https://habr.com/ru/post/1270720/


All Articles