Expanded Find of Unique Tuples in a Relationship Submitted by BDD

We consider {<1,2>, <1,3>, <1,7>, <0,4>} as a set of tuples of the relation R. Now we consider that R is represented (through its membership function) using BDD. That is, the BDD representing R depends on the variables {x1, x2, y1, y2, y3}, where {x1, x2} are used to represent the first element of each set, and {y1, y2, y3} are used to represent the second element.

Now let's look at the problem of finding a set of tuples that have unique values ​​in their first element. For relationships above this set, there will be {<0.4>}. All other elements are discarded because they contain more than one value having 1 in the first component.

As a second example, consider a relationship with many sets {<1,2>, <1,3>, <1,7>, <2,3>, <2,5> <0,4>}. In this case, the expected result is still {<0.4>}, since 2 appears more than once as the first element.

The problem can also be considered as an abstraction of the variables {y1, y2, y3}, so that they remain the only values ​​for {x1, x2}. Using this result, the expected ratio can be restored by calculating the conjunction of the resulting BDD with the input.

So the question is: what are the BDD operations that must be performed on the R representation in order to get BDDs with unique tuples only.

Please note that this is the genralization of this question

EDIT 1: The following code reflects the implementation I have so far. However, I am wondering if a more efficient version can be obtained. For simplicity, I deliberately omit the processing of the computed table (it is imperative to get the best time complexity). In addition, I use &, | and! to denote conjunction, disjunction and complementation operations on BDD.

BDD uniqueAbstract(BDD f, BDD cube) { if ((f.IsZero() || f.IsOne()) && !cube.IsOne()) return zero(); BDD T = high(f); BDD E = low(f); if(level(f) == level(c)) { // current var is abstracted BDD uniqueThen = uniqueAbstract(T, high(c)); BDD existElse = existAbstract(E, high(c)); BDD existThen = existAbstract(T, high(c)); BDD uniqueElse = uniqueAbstract(E, high(c)); return (uniqueThen & !existElse) | (uniqueElse & !existThen) } else { BDD uniqueThen = uniqueAbstract(T,c); BDD uniqueElse = uniqueAbstract(E,c); return ite(top(f), uniqueThen, uniqueElse); } } 

EDIT2: After three different implementations, there are still performance issues. Let me describe three of them.

  • Implementation C of my approach, let me name reference implementation 4 .
  • The implementation suggested by user meolic in accepted answer 3 .
  • A hybrid approach between the two and the available 2 .

The purpose of this update is to analyze a little the results of using the three approaches. Since interim measures are currently misleading to judge them, I decided to evaluate the implementation for a different set of measures.

  • Recursive challenges
  • Cache access
  • Abstract simple. The number of times a function call has been resolved without the need for existential abstraction.
  • Abstract complex: the number of times a function call has been resolved, requiring existential abstraction.
  • Exist abstract: the number of calls to existential abstraction.

Results for implementation 1: (21123 ms): Unique abstraction statistics: Recursive calls: 1728549.000000 Cached hits: 638745.000000 Not abstract: 67207.000000 Abstract is simple: 0.000000 Abstract complex: 0.000000 Exist abstract: 1593430.000000

Results for implementation 2: (execution time: 54727 ms) Unique abstraction statistics: Recursive calls: 191585.000000 Cached hits: 26494.000000 Abstract simple: 59788.000000 Abstract complex: 12011.000000 Exist abstract: 24022.000000

Results for implementation 3: (execution time: 20215 ms) Unique abstraction statistics: Recursive calls: 268044.000000 Cached hits: 30668.000000 Abstract simple: 78115.000000 Abstract complex: 46473.000000 Exist abstract: 92946.000000

EDIT 3: The following results were obtained after implementing each logical operation in ITE 5 terms.

  • uniqueAbstractRecRef (21831 ms) Unique abstraction statistics: Total calls: 1723239 Optimized calls: 0 Total abstract calls exist: 30955618 Unique abstract life calls Abstract: 2385915 Total calls: 3574555 Of the total time, uniqueAbstractRecRef takes 4001 ms (12.4%)

  • uniqueAbstractSERec (56761 ms) Unique abstraction statistics: Total calls: 193627 Optimized calls: 60632 Total abstract calls: 16475806 Unique abstract live calls Abstract: 24304 Total calls: 1271844 Of the total time, uniqueAbstractSERec takes 33918 ms (51.5%)

  • uniqueAbstractRec (20587 ms) Unique abstraction statistics: Total calls: 270205 Optimized calls: 78486 Total abstract calls: 13186348 Unique abstract existence calls Abstract: 93060 Total calls: 1256872 Of the total time, uniqueAbstractRec takes 3354 ms (10.6%)

+5
source share
3 answers

There is a simple and effective solution if the variables are ordered so that x1 and x2 are at the top of the BDD.

Consider the BDD for the second example.

You can go through (in first order) the first two layers to get four sub-BDDs. One for each possible combination of x1,x2 . Three of these sub-BDDs are embedded in y1 , and the fourth is empty (constant False).

bdd

Now you can count the number of elements in each sub-BDD (Knuth Algorithm C from Volume 4 Fascicle 1, Bitwise Tricks and Methods, Binary Decision Schemes ).

If the number of elements in sub-BDD is greater than 1, then drop it (the shortcut from the parent node directly to False ), otherwise leave it as it is.

You can run this algorithm in one pass by storing partial results when counting elements.

+2
source

Here is my implementation. I studied the solution proposed by the author, and it seems to me that this is the best, if not the only simple solution based on BDD for arbitrary ordering. However, if the algorithm is implemented in my way, there may be some improvements - PLEASE CHECK. I use my own wrapper over the BDD package, but you should not have any problems to figure this out.

EDITED: I simplified the solution, the Bdd_GetVariableChar () function is no longer used.

 /* TESTING SOLUTION FOR QUESTION ON Qaru */ /* bdd_termFalse,bdd_termTrue: Boolean constants */ /* Bdd_isTerminal(f): check if f is Boolean constant */ /* Bdd_Low(f),Bdd_High(f): 'else' and 'then' subfunction */ /* Bdd_Top(f): literal function representing topvar of f */ /* Bdd_IsSmaller(f,g): check if topvar of f is above topvar of g */ /* existentialAbstraction(f,cube): \exist vf for all v in cube */ Bdd_Edge specialAbstraction(Bdd_Edge f, Bdd_Edge cube) { if (Bdd_isTerminal(cube)) return f; if (Bdd_isTerminal(f)) return bdd_termFalse; if (Bdd_IsSmaller(f,cube)) { Bdd_Edge E,T; E = specialAbstraction(Bdd_Low(f),cube); T = specialAbstraction(Bdd_High(f),cube); return Bdd_ITE(Bdd_Top(f),T,E); } else if (Bdd_IsSmaller(cube,f)) { return bdd_termFalse; } else { Bdd_Edge E,T; cube = Bdd_High(cube); E = Bdd_Low(f); T = Bdd_High(f); if (Bdd_isEqv(E,bdd_termFalse)) { return specialAbstraction(T,cube); } else if (Bdd_isEqv(T,bdd_termFalse)) { return specialAbstraction(E,cube); } else { Bdd_Edge EX,TX,R; EX = existentialAbstraction(E,cube); TX = existentialAbstraction(T,cube); if (Bdd_isEqv(EX,TX)) return bdd_termFalse; R = Bdd_ITE(Bdd_ITE(EX,bdd_termFalse,T), bdd_termTrue, Bdd_ITE(TX,bdd_termFalse,E)); return specialAbstraction(R,cube); } } } 

And yes, if the variable ordering is fixed with x above y, the algorithm can be much more efficient - you can remove all the calculations from the most complex else block and just return 0.

Here are a few test runs:

 CUBE (JUST IN CASE YOU ARE NOT FAMILIAR WITH BDD ALGORITHMS) + y1 y2 y3 y4 y5 ORIGINAL (ORDERED WITH X ABOVE Y) + *x1 *x2 x3 *x4 x5 y1 *y2 y3 y4 y5 + *x1 x2 *x3 *x4 *x5 y1 y2 *y3 y4 y5 + *x1 x2 *x3 *x4 x5 *y1 y2 *y3 y4 y5 + *x1 x2 *x3 x4 *x5 y1 *y2 y3 *y4 *y5 + *x1 x2 x3 *x4 x5 *y1 *y2 *y3 *y4 y5 + *x1 x2 x3 *x4 x5 *y1 y2 y3 *y4 *y5 + x1 *x2 *x3 *x4 *x5 y1 y2 y3 y4 *y5 + x1 x2 *x3 x4 x5 *y1 *y2 *y4 *y5 + x1 x2 x3 *x4 *x5 *y1 *y2 *y3 y4 *y5 ABSTRACTION + *x1 *x2 x3 *x4 x5 + *x1 x2 *x3 *x4 + *x1 x2 *x3 x4 *x5 + x1 *x2 *x3 *x4 *x5 + x1 x2 x3 *x4 *x5 ORIGINAL (ORDERED WITH Y ABOVE X) + *y1 *y2 *y3 *y4 *y5 x1 x2 *x3 x4 x5 + *y1 *y2 *y3 *y4 y5 *x1 x2 x3 *x4 x5 + *y1 *y2 *y3 y4 *y5 x1 x2 x3 *x4 *x5 + *y1 *y2 y3 *y4 *y5 x1 x2 *x3 x4 x5 + *y1 y2 *y3 y4 y5 *x1 x2 *x3 *x4 x5 + *y1 y2 y3 *y4 *y5 *x1 x2 x3 *x4 x5 + y1 *y2 y3 *y4 *y5 *x1 x2 *x3 x4 *x5 + y1 *y2 y3 y4 y5 *x1 *x2 x3 *x4 x5 + y1 y2 *y3 y4 y5 *x1 x2 *x3 *x4 *x5 + y1 y2 y3 y4 *y5 x1 *x2 *x3 *x4 *x5 ABSTRACTION + *x1 *x2 x3 *x4 x5 + *x1 x2 *x3 *x4 + *x1 x2 *x3 x4 *x5 + x1 *x2 *x3 *x4 *x5 + x1 x2 x3 *x4 *x5 ORIGINAL (MIXED ORDER) + *x1 *x2 y1 *y2 y3 y4 y5 x3 *x4 x5 + *x1 x2 *y1 *y2 *y3 *y4 y5 x3 *x4 x5 + *x1 x2 *y1 y2 *y3 y4 y5 *x3 *x4 x5 + *x1 x2 *y1 y2 y3 *y4 *y5 x3 *x4 x5 + *x1 x2 y1 *y2 y3 *y4 *y5 *x3 x4 *x5 + *x1 x2 y1 y2 *y3 y4 y5 *x3 *x4 *x5 + x1 *x2 y1 y2 y3 y4 *y5 *x3 *x4 *x5 + x1 x2 *y1 *y2 *y3 *y4 *y5 *x3 x4 x5 + x1 x2 *y1 *y2 *y3 y4 *y5 x3 *x4 *x5 + x1 x2 *y1 *y2 y3 *y4 *y5 *x3 x4 x5 ABSTRACTION + *x1 *x2 x3 *x4 x5 + *x1 x2 *x3 *x4 + *x1 x2 *x3 x4 *x5 + x1 *x2 *x3 *x4 *x5 + x1 x2 x3 *x4 *x5 
+2
source

One approach involves the direct translation of the definition of uniqueness:

 R(x,y) and forall z . ~R(x,z) or y = z 

An implementation might look like this:

 def inspect(function, name, nvars): sys.stdout.write(name) # avoid print trailing space or newline function.summary(nvars) # minterm count needs number of variables function.printCover() import sys from cudd import Cudd m = Cudd() nx = 2 ny = 3 x = [m.bddVar() for i in range(nx)] y = [m.bddVar() for i in range(ny)] R = (~x[0] & x[1] & (~y[0] & y[1] | y[1] & y[2]) | x[0] & ~x[1] & (y[0] ^ y[1]) & y[2] | ~x[0] & ~x[1] & y[0] & ~y[1] & ~y[2]) # This approach is independent of variable order. We are free to enable # reordering or call it explicitly. m.reduceHeap() inspect(R, 'R', nx+ny) # Create auxiliary variables and selector function. z = [m.bddVar() for i in range(ny)] zcube = reduce(lambda a, b: a & b, z) P = ~m.xeqy(y,z) # A pair is in L iff: # - it is in R # - there is no other pair in R with the same x and different y L = R & ~(R.swapVariables(y,z).andAbstract(P,zcube)) inspect(L, 'L', nx+ny) 

The result of executing this code:

 R: 10 nodes 1 leaves 6 minterms 01-11 1 10101 1 10011 1 00100 1 0101- 1 L: 6 nodes 1 leaves 1 minterms 00100--- 1 

The first two variables encode the first element of the pair; the following three variables encode the second element of the pair; the last three variables are auxiliary variables.

The code applies DeMorgan to the above formula to use andAbstract .

+1
source

Source: https://habr.com/ru/post/1204700/


All Articles