Normalize boolean expression for cache reasons. is there a more efficient way than truth tables?

My current project is an extended tag database with boolean search functions. Records are requested with Boolean expressions like these (for example, in a music database):

funky-music and not (live or cover) 

which should give all the funky music to the music database but not live or cover songs.

When it comes to caching, the problem is that there are queries that are equivalent but different in structure. For example, applying the Morgan rule, the above query can be written as follows:

 funky-music and not live and not cover 

which would lead to the same entries, but to the caching of interruption reasons, when caching would be implemented by hashing the query string, for example.

Therefore, my first intention was to create a query truth table, which can then be used as a caching key, since equivalent expressions form the same truth table. Unfortunately, this is almost impossible, since the truth table grows exponentially with the number of inputs (tags), and I do not want to limit the number of tags used in one query.

Another approach would be to go through a syntax tree that applies the rules defined by Boolean algebra to form a (minimal) normalized representation that also seems complicated.

So the general question is: is there a practical way to implement the recognition of equivalent queries without the need to minimize a schema or truth table (edit: or any other algorithm that is NP-hard)?

Ne plus ultra will recognize already cached subqueries, but this is not the main goal.

+6
source share
3 answers

A general and effective algorithm for determining whether a query is equivalent to "False" can be used to effectively solve NP problems, so you are unlikely to find it.

You can try converting your queries to canonical form. Because of the foregoing, there will always be questions that are very expensive to convert to any form, but you may find that in practice some form works pretty well most of the time - and you can always give up half if it gets too heavy.

You can see http://en.wikipedia.org/wiki/Conjunctive_normal_form , http://en.wikipedia.org/wiki/Disjunctive_normal_form , http://en.wikipedia.org/wiki/Binary_decision_diagram .

+1
source

You can convert queries to conjunctive normal form (CNF). This is a canonical, simple representation of Boolean formulas, which is usually the basis for SAT solvers.

Most likely, β€œlarge” requests will have many unions (rather than many disjunctions), so CNF should work well.

+1
source

The Quine-McCluskey algorithm should achieve what you are looking for. This is similar to Carnot Maps, but easier to implement in software .

0
source

Source: https://habr.com/ru/post/887051/


All Articles