Separate the problem and see what happened. Starting from two lines. We need to check which suffix of one line is the longest prefix of another. This gives us an order of better concatenation.
Now, with a set of Russian words, how do we do it? We start by creating a trie containing each word (key for each word). If the word is a duplicate of another, we can easily designate it as such when constructing the prefix tree.
I quickly completed the regular Trie. You can find it here .
We have tools for constructing a directed graph linking different words, and the suffix of the first is the prefix of the second. Edge weight is the length of the suffix.
To do this, for each word w of the input data set, we need to see what words we can achieve with the suffix w:
- We go down the trie using the suffix. We will end up in node (or not).
- From this node, if it exists, we scan the remaining subtree to see which words are available.
- If the given suffix of length l gives a match with the prefix of the word w ', then we add the edge w → w', with the weight length (w ') - l.
- If such an edge already exists, we simply update the weight to maintain the lowest.
A graph is established from there, and we must find the shortest path that passes through each vertex (for example, a word) only once . If the schedule is completed, this is a problem with the seller . In most cases, the schedule will not be completed.
However, this remains an NP-hard problem. In more "technical" terms, the problem is to find the shortest Hamiltonian path of the digraph .
Note. If a Hamiltonian path is given (if it exists) with its value C and its starting vertex (word) W, then the length of the super-summation is determined as follows:
L super = L W + C
Note. If in a nutshell there is no suffix connecting them with another word, then the graph is not connected and there is no Hamiltonian path.
source share