Finding the smallest set of uniqueness criteria

I have a set of objects with properties. I want to find the simplest set of criteria that will specify exactly one of these objects (I don't care which one).

For example, given {a = 1, b = 1, c = 1}, {a = 1, b = 2, c = 1}, {a = 1, b = 1, c = 2}, specifying b == 2 (or c == 2) will give me a unique object.

Similarly, given {a = 1, b = 1, c = 1}, {a = 1, b = 2, c = 2}, {a = 1, b = 2, c = 1}, indicating b == 2 and c == 2 (or b == 1 & c == 1 or b == 2 & c == 1) will give me a unique object.

This sounds like a known problem with a known solution, but I could not find the correct formulation of the problem to allow me to use it.

+6
source share
4 answers

Freedom to choose a goal is something unusual. If the goal is set, then this essentially sets the cover problem . There are two corresponding instances side by side.

A={1,2,3} B={2,4} C={3,4} D={4,5} 0: {a=0, b=0, c=0, d=0} # separate 0 from the others 1: {a=1, b=0, c=0, d=0} 2: {a=1, b=1, c=0, d=0} 3: {a=1, b=0, c=1, d=0} 4: {a=0, b=1, c=1, d=1} 5: {a=0, b=0, c=0, d=1} 

Although the installed cover is NP-hard, your problem has the O (m log n + O (1) poly (n)) algorithm, where m is the number of attributes and n is the number of elements (the optimal set of criteria has a size of no more than log n), making it unlikely that evidence of NP hardness is expected. I was reminded of the situation with the Junta problem (basically a theoretical formulation of function selection).

+2
source

This is a really well-known problem in choosing an AI function. There are many algorithms for this simple Google “artificial intelligence” feature selection.

The main problem is that when the sample set is large, you need to use some kind of heuristic in order to reach a solution in a reasonable amount of time.


Feature Selection in Data Mining

The main idea of ​​choosing a function is to select a subset of the input variables by excluding functions with little or no predictive information.

+4
source

I don’t know how easily this could be translated into an algorithm, but using SQL, which is already set up based, it might look like this:

  • build a table with all possible combinations of columns from the input table
  • select all combinations that look equal to the number of entries present in the input table as various combinations.

SQL script

 ;WITH q (a, b, c) AS ( SELECT '1', '1', '1' UNION ALL SELECT '1', '2', '2' UNION ALL SELECT '1', '2', '1' UNION ALL SELECT '1', '1', '2' ) SELECT col FROM ( SELECT val = a, col = 'a' FROM q UNION ALL SELECT b, 'b' FROM q UNION ALL SELECT c, 'c' FROM q UNION ALL SELECT a+b, 'a+b' FROM q UNION ALL SELECT a+c, 'a+c' FROM q UNION ALL SELECT b+c, 'b+c' FROM q UNION ALL SELECT a+b+c, 'a+b+c' FROM q ) f GROUP BY col HAVING COUNT(DISTINCT (val)) = (SELECT COUNT(*) FROM q) 
0
source

Your problem can be defined as follows:

 1 1 1 -> A 1 2 1 -> B 1 1 2 -> C . . 

where 1 1 1 is called a vector function, and A is the class of the object. Then you can use decision trees (with clipping) to find a set of rules for classifying objects. So, if your goal is to automatically select a set of criteria to identify object A , then you can observe the path in the decision tree that leads to A

If you have access to MATLAB , it’s very easy to get a decision tree for your data.

0
source

Source: https://habr.com/ru/post/903089/


All Articles