Edit: on clarified issue:
- Row sorting
- Find the longest common prefix of each adjacent pair
- Sort and deduplicate common prefixes, and then remove any string prefix of another.
Actually, step (3) only requires removing any duplicate / prefix of the other, which you could do with trie or something else instead of sorting. In fact, it could be that all this can be done faster with a suitable annotated trie - if you turn on the โcountโ in each node, then you are definitely looking at nodes with a score of 2+ that do not have children with a score of 2+.
But sorting is built-in, and once you are sorted, you can detect prefixes by looking at adjacent elements, so there may be less effort.
[Original answer:
Just a one-time operation, find the longest common prefix between all lines?
I would do this in terms of the length of the prefix. In pseudocode and in assuming nul-terminated strings:
prefixlen = strlen(first_string); foreach string in the list { for (i = 0; i < prefixlen; ++i) { if (string[i] != first_string[i]) { prefixlen = i; break; } } if (prefixlen == 0) break; } common_prefix = substring(firststring, 0, prefixlen);
]
source share