Based on the dictionary, find all possible letter options

Recently I was asked the following question:

You have a dictionary page written in a foreign language. Suppose the language is similar to English and is read / written correctly from left to right. In addition, the words are arranged in lexicographical order. For example, the page could be: ADG, ADH, BCD, BCF, FM, FN
You must give all lexicographic orders the possibility of the character installed on the page.

My approach is as follows: A has a higher priority than B, and G has a higher priority than H. Therefore, we have ordering information for some characters:

A->B, B->F, G->H, D->F, M->N 

Possible orders could be ABDFGNHMC, ACBDFGNHMC, ... My approach was to use the array as the owner of the position and generate all permutations to identify all valid orders. The worst time difficulty for this is N! where N is the size of the character set. Can we work better than brute force.

Thanks in advance.

+6
source share
4 answers

Donald Knuth wrote the article Structured Program for Creating All Topological Sorting Devices . This article was originally studied in 1974. The following quote from the article led me to a better understanding of the problem (in the text, the relation i & j means "i precedes j"):

The natural way to solve this problem is to allow x 1 to be an element that has no predecessors, then remove all relations from x 1 <j and let x 2 be the element & ne; x 1 without predecessors in the system, since it exists, then erase all relations from x 2 <j, etc. It is not difficult to verify that this method will always be successful, provided there is an oriented loop at the input. Moreover, in a sense, this is the only way, since x 1 must be an element without predecessors, and x 2 must be without predecessors when all relations x 1 <j are deleted, etc. This observation naturally leads to an algorithm that finds all solutions to the topological sorting problem; this is a typical example of the β€œreturn trip” procedure, where at each stage we consider the subtask from β€œFind all paths to complete the partial rearrangement of x 1 x 2 ... x k to topological sorting x 1 x 2 ... x n . The general method is in the division into all possible options x k + 1 .
The central problem in backtrack applications is finding a suitable way to organize data so that the sequence through the possible options x k + 1 ; in this case, we need an effective way to detect the set of all elements & ne; {x 1 , ..., x k } that have no predecessors other than x 1 , ..., x k and maintain this knowledge effectively when we move from one subtask to another.

The document contains pseudo-code for an efficient algorithm. The time complexity for each output is O (m + n), where m is the number of input relations, and n is the number of letters. I wrote a C ++ program that implements the algorithm described in the document that supports variables and function names - which takes the letters and relations from your question as input. I hope that no one complains that this answer will be provided to the program due to the language agnostic tag.

 #include <iostream> #include <deque> #include <vector> #include <iterator> #include <map> // Define Input static const char input[] = { 'A', 'D', 'G', 'H', 'B', 'C', 'F', 'M', 'N' }; static const char crel[][2] = {{'A', 'B'}, {'B', 'F'}, {'G', 'H'}, {'D', 'F'}, {'M', 'N'}}; static const int n = sizeof(input) / sizeof(char); static const int m = sizeof(crel) / sizeof(*crel); std::map<char, int> count; std::map<char, int> top; std::map<int, char> suc; std::map<int, int> next; std::deque<char> D; std::vector<char> buffer; void alltopsorts(int k) { if (D.empty()) return; char base = D.back(); do { char q = D.back(); D.pop_back(); buffer[k] = q; if (k == (n - 1)) { for (std::vector<char>::const_iterator cit = buffer.begin(); cit != buffer.end(); ++cit) std::cout << (*cit); std::cout << std::endl; } // erase relations beginning with q: int p = top[q]; while (p >= 0) { char j = suc[p]; count[j]--; if (!count[j]) D.push_back(j); p = next[p]; } alltopsorts(k + 1); // retrieve relations beginning with q: p = top[q]; while (p >= 0) { char j = suc[p]; if (!count[j]) D.pop_back(); count[j]++; p = next[p]; } D.push_front(q); } while (D.back() != base); } int main() { // Prepare std::fill_n(std::back_inserter(buffer), n, 0); for (int i = 0; i < n; i++) { count[input[i]] = 0; top[input[i]] = -1; } for (int i = 0; i < m; i++) { suc[i] = crel[i][1]; next[i] = top[crel[i][0]]; top[crel[i][0]] = i; count[crel[i][1]]++; } for (std::map<char, int>::const_iterator cit = count.begin(); cit != count.end(); ++cit) if (!(*cit).second) D.push_back((*cit).first); alltopsorts(0); } 
+3
source

There is no algorithm that can work better than O (N!) If there is N! the answers. But I think there is a better way to understand the problem:

You can build a directed graph in this way: if A appears before B, then there is an edge from A to B. After plotting, you just need to find all the possible topological sorting results . Still O (N!), But it's easier to code and better than your approach (no need to generate an invalid order).

+3
source

I would solve it as follows:

  • Look at the first letter: (A β†’ B β†’ F)
  • Look at the second letter, but consider only those who have the same first letter: (D), (C), (M β†’ N)
  • Look at the third letter, but consider only those who have the same letters 1. and 2.: (G β†’ H) , (D β†’ F)
  • And so on, while this is something else ... (Look at the N-th letter, the group by the previous letters)

In brackets contains all the information that you get from the set (all possible orderings). Ignore parentheses with just one letter, because they do not represent order. Then take everything in brackets and sort topologically.

0
source

ok, I immediately admit that I do not have an estimate of the time complexity for the average case, but perhaps the following two observations will help.

Firstly, this is an obvious candidate for a library of restrictions. if you did this in practice (for example, it was some kind of task at work), then you would get a constraint solver, provide it with various pair settings, and then ask for a list of all the results.

the second, which is usually implemented as a search. if you have N characters, consider a tree whose root node has N children (select the first character); the next node has N-1 children (select the second character); etc. obviously this is N! worst case for full intelligence.

even with a β€œdumb” search, you can see that you can often crop your search by checking your order at any point with respect to the pairs that you have.

but since you know that there is a general ordering, even if you (can) have only partial information, you can make the search more efficient. for example, you know that the first character should not appear to the right of <for any pair (assuming each character is given a numerical value, with the first character being the lowest). likewise, moving through the tree for the corresponding abbreviated data.

you can list possible solutions by exploring the tree, using incomplete ordering information to limit the possible options on each node.

hope this helps.

0
source

Source: https://habr.com/ru/post/896478/


All Articles