Reduce list based on item substrings

I am looking for the most efficient way to reduce a given list based on substrings already in the list.

for instance

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

will be reduced to:

mylist = ['abcd','qrs']

because both "abcd" and "qrs" are the smallest substring of the other elements in this list. I was able to do this with about 30 lines of code, but I suspect there is a tricky one-liner there.

+4
source share
4 answers

it seems to work (but not as effective, I suppose)

def reduce_prefixes(strings):
    sorted_strings = sorted(strings)
    return [element
            for index, element in enumerate(sorted_strings)
            if all(not previous.startswith(element) and
                   not element.startswith(previous)
                   for previous in sorted_strings[:index])]

Tests:

>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu'])
['abcd', 'qrs']
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu',
                    'gabcd', 'gab', 'ab'])
['ab', 'gab', 'qrs']
+3
source

One solution is to iterate through all the lines and split them based on whether they have different characters and recursively apply this function.

def reduce_substrings(strings):
    return list(_reduce_substrings(map(iter, strings)))

def _reduce_substrings(strings):
    # A dictionary of characters to a list of strings that begin with that character
    nexts = {}
    for string in strings:
        try:
            nexts.setdefault(next(string), []).append(string)
        except StopIteration:
            # Reached the end of this string. It is the only shortest substring.
            yield ''
            return
    for next_char, next_strings in nexts.items():
        for next_substrings in _reduce_substrings(next_strings):
            yield next_char + next_substrings

, .

, - - .

0

Probably not the most efficient, but at least short:

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

outlist = []
for l in mylist:
    if any(o.startswith(l) for o in outlist):
        # l is a prefix of some elements in outlist, so it replaces them
        outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
    if not any(l.startswith(o) for o in outlist):
        # l has no prefix in outlist yet, so it becomes a prefix candidate
        outlist.append(l)

print(outlist)
0
source

Try the following:

import re
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
new_list=[]
for i in mylist:
    if re.match("^abcd$",i):
        new_list.append(i)
    elif re.match("^qrs$",i):
        new_list.append(i)
print(new_list)
#['abcd', 'qrs']
-1
source

Source: https://habr.com/ru/post/1679170/


All Articles