Algorithm: Is there a good way to solve the comparison?

I have a set of numbers to compare. Let's say I have to get this comparison from the user. I have a choice either to ask him a question consisting of two numbers or 3 numbers of 4 numbers. For example, I can ask one of the following questions:

  • Which of the numbers is greater? 2 OR 4
  • Which of the numbers is greater? 2 OR 3 OR 4
  • Which of the numbers, if more? 2 OR 3 OR 4 OR 5

My goal is to minimize the number of questions that I ask the user in some combination that will ultimately give me an ordering of all the numbers in the set ... For example, if there were only 4 numbers {2,3, 4,5} , I could ask him the third type of question, where I give him 4 numbers for comparison. But in the ecosystem that I am developing for this, the user is annoyed by long questions, so I would like to minimize the number of questions of this type that I would ask. Therefore, if each of the questions has a certain weight, I try to find a way to ultimately get an ordering of all numbers, but at the same time present minimal problems to the user.

Is there a good way to solve this problem? Does anyone see this as belonging to a common class of problems, or am I just making it too complicated? Any suggestions?

+4
source share
4 answers

Let the experiment, right.

1. Example

Let's look at {A,B,C,D} and sort it.

Solution 1: in sets

  • Greater than {A,B,C,D} β†’ B (thus B>A , B>C and B>D )
  • Greater than {A,C,D} β†’ D (thus D>A and D>C )
  • Greater than {A,C} β†’ A (thus A>C )

General order [B,D,A,C]

Solution 2: In pairs

  • More between A and C β†’ A (so A>C )
  • More between B and D β†’ B (thus B>D )
  • More between D and A β†’ D

General order [B,D,A,C]

What is the catch? Obviously, this is more complicated in pairs, here I chose them so that the merger would be simple (none).

2. Notes

a) General ordering

> works so well with full ordering: i.e. for two given elements A and B set, either A>B or B>A If none of these two relations holds, then A and B equivalent.

The problem with Solution 1's approach is that if you represent a user with {A,B,C,D} and A and B equivalent and larger than C and D ... then what should answer

b) transitivity

The relation > transitive, which means that if A>B and B>C , then A>C It is important to use this fact, since you can display the relationship between two elements without even asking the user.

c) What is the purpose?

Is the goal to minimize the number of questions for the user or to minimize the user's work? Because, obviously, it is more difficult for the user to answer questions from the first solution ...

3. Modeling

You can simulate the problem as a "graph" problem.

You start with a set of nodes {A,B,C,D} that represent the values ​​you want to check.

Sorting a set is equivalent to computing the minimum set of oriented edges that connect these nodes, so for any two nodes, the path leads from one to the other. I insist on the minimum.

For example, if I have 2 edges: B>A and B>C , then if I detect than A>C , I will remove B>C , because I can infer it by transitivity. Important properties are that if neither of the two nodes is equivalent, the cardinal of the resulting set of edges is the cardinal of the set of nodes minus 1. In my example (taking into account 4 nodes) this will be 3.

An oracle (or an extremely lucky guy) will thus be able to ask only 3 questions, but still build this schedule ... what should we strive for.

4. How to solve this problem?

Well, let us not have two equivalent elements. This means that if A>B is false, then I can conclude that B>A ...

To represent our little graph, we take an array:

  ABCDD . > . # . represent the unknown C . > B > # < and > have their usual meaning... A 

Because of symmetry, we only need a triangle.

Counting the number . , we can see the number of unknown relationships, the ideal solution is to get rid of all these . by asking as few questions as possible to the user.

So a good question is to remove as much as possible . in one fell swoop.

5. My question

And so I have another question:

 Select the elements lower than "D" in the following {A,B,C}: _ 

This question feels better than what's bigger ... because I'm clearly focused on the relationships I want to know ( D?A , D?B and D?C ), while there are great guarantees that I will get so many relationships but I don’t know what in advance.

I try to maximize the usefulness of the question

Of course, for laziness this does not mean: it is still important to take into account the transition when using the results of questions.

6. The study of large sets

With larger sets, you can group the elements and then combine them later, but merging is always a dirty operation: you will probably get answers that you already knew. However, this is a big practice for my little question:

For two ordered (disjoint) sets: [A,B,C,D,...] and [R,S,T,U,...] we will consider 3 questions on the toolbar:

  • Which is greater: A or R ? _
  • What is the largest element {A,B,R,S} ? _
  • Which elements are greater than A in {R,S,T} ? _

  • Gives 1 connection

  • Gives 2 relationships: 3 of which are already known
  • Gives 3 relationships

The third question requires a more complex answer, but it is much more suitable for merge situations. In the case of a merger, it is as effective as requesting a sort of elements, since it requests exactly for three relations that we do not know on the board.

7. Conclusion

You now have 4 questions:

  • Sort: Sort the following items from largest to lowest {A, B, C, D}? _
  • Pivot: Which elements are greater than A in this set {B, C, D}? _
  • Greater: Which element is greater in this set {A, B, C, D}? _
  • Couple: Which of the elements A and B is bigger? _

(A pair can be considered as a degenerate case of either the Big or the Pivot)

Given that each question gives an n-1 relationship for n nodes, you can try to determine how long it will take for the user to answer the T(n) question, and then find n that maximize T(n)/n ;)

+5
source

If you are sorting an array, you usually sort by time O (n log (n)). This means that you sometimes compare pairs of numbers more than once. If you constantly ask 4 questions, you get additional information that you can save.

When you get to the point at which you specify a> b, you may also ask if the largest of {b, c, d, e, ...} is. So you should never ask> c, a> d, etc. If you save these known comparisons in a table, you will never have to repeat them. As for the choice of what the caparison sets are asking the user, I'm not sure, and this is certainly a good question.

My intuitive idea of ​​which sets to pose a question to the user would be to see how quick-sort works. It splits the data into two parts, and numbers that are not in the same group are never compared, so you know that DO NOT enter into a set to ask the user. Use 4 numbers as long as possible until quicksort gets into groups with a number less than 4.

+2
source

You can simply do a merge sort or a quick sort algorithm where the comparison function asks the user for a version with two question numbers, and then most of the questions they need to answer are n log n.

The question then becomes: can you ask which of the three or four of them reduces the number of questions you ask. My intuition says no, but it would be difficult for me to prove it.

+1
source

You can submit questions using quicksort or mergesort algorithms. Minimize the number of comparisons and ensure the correct result.

0
source

Source: https://habr.com/ru/post/1301784/


All Articles