Complexity
In the general case, as mentioned above, this problem will be NP-hard. There is a way to resolve the issue with a suffix structure, but you can also use a directed graph for this. I can’t say for sure if this is better in a sense, but there may be some advantage in difficult corner cases.
Graphical solution
To understand how you can use a chart, you just need to build it correctly. Let the letters of the lines be the vertices, and the order of the letters will determine the edges. This means that ab
will mean the vertices a
and b
and the connection between a
and b
. Of course, the connections are directed (i.e., if a
connected to b
, this does not mean that b
connected to a
, so ab
is a --> b
)
After these definitions, you can build the entire schedule. For your sample, this is:
![Simple regular case](https://fooobar.com/undefined)
- simple enough. But abcd
can also be represented by strings of two lengths like ab, ac, ad, bc, bd, cd
- so I will also show a graph for this (this is just for clarity):
![Complex regular case](https://fooobar.com/undefined)
Then, to find your solution, you need to find the path with the maximum length on this graph. Obviously, there is a place from which NP-hardness is “derived”. In both cases, the maximum length is 4
, which will be achieved if we start from the vertex a
and the movement graph to the found path a->b->c->d
.
Corner cases
Unequal solution : in fact, you may encounter sets of strings that cannot strictly define a superset. Example: ab, ad, bd, ac, cd
. Both abcd
and acbd
correspond to these substrings. But, in fact, this is not so bad. Look at the chart:
![Ambiguous way case](https://fooobar.com/undefined)
(I chose this for a reason: it looks like a second graph, but without one connection, so the result is mixed). The maximum path length is now 3
- but this can be achieved using two paths: abd
and acd
. So, how to restore at least one solution? Simple: since the lines of results will be the same length (which comes from the definition of the way we found them), we can simply go from the beginning of the first line and check the characters in the second line. If they match, then this is the strict position of the character. But if they don’t, then we can choose any order between the current characters. So this will be:
[a] matches [a]
, so the current result is a
[b]
does not match [c]
, so we can place either bc
or cb
. Let it be the first. Then the result is abc
[d]
matches [d]
, so the current result is abc
+ d
, that is, abcd
This is a kind of "merger", where we can choose any result. However, this is a slightly distorted case, because now we can’t only use any path found - we must find all paths with the maximum length (otherwise we will not be able to restore full supersymmetry)
Further, a nonexistent solution : there may be cases when the order of characters in some lines cannot be used to reproduce supersymmetry. Obviously, this means that one order violates another order, so there will be two lines in which some two characters have different orders. Again, the simplest case is: abc, cbd
. On the chart:
![Non-existent solution](https://fooobar.com/undefined)
- the inevitable consequence will be the cycle of the schedule - it may not be as simple (as in the chart above), but it will always be if the order is disturbed. Thus, all you need to understand is to find a graph schedule . In fact. this is not something you should do separately. You simply add loop detection in your search for a long graph path.
Third: duplicate characters : this is the most difficult case. If the string contains duplicate characters, then creating a chart is more difficult, but you can still do it. For example, suppose three conditions: aac, abc, bbc
. The solution for this is aabbc
. How to build a schedule? Now we can’t just add links (at least in loops). But I suggest the following way:
- Let's process all our lines by assigning indexes to characters. The reset pointer to a string. If a character appears only once, the index will be
1
. For our lines, this means: 1 a 2 c 1 , a 1 b 1 c 1 and b 1 b 2 c 1 . After that, save the maximum index for each character found. For the sample above, which will be 2
for a
, 2
for b
and 1
for c
- If two linked indexed characters have the same original character, then the connection is performed “as is”. For example, 1 a 2 will create only one 1 in 2 connection
- If two related indexed characters have different source characters, then any first indexed character can be associated with any second indexed character. For example, b 1 c 1 will result in the compound (b 1 to c 1 ) and (b 2 to c 1 ). How do we know how many indexed characters can be? We did this in the first step (i.e., found maximum indices)
Thus, the chart for the example above will look like this:
+ ------------ +
| |
| |
+ ------> [a2] ------ + |
| | | |
| VV |
[a1] ----> [c1] <---- [b2] |
| ^ ^ |
| | | |
+ ------> [b1] ------ + |
^ |
| |
+ ------------ +
- we will have a maximum length of 5
for paths a 1 a 2 b 2 b 1 c 1 and 1 a 2 b 1 b 2 c <sub> 1 sub>. We can select any of them, ignoring the indices in the aabbc
result aabbc
.