Removing the entire row extension in the list

I have dictionaries, for example:

'1' : ['GAA', 'GAAA', 'GAAAA', 'GAAAAA', 'GAAAAAG', 'GAAAAAGU', 'GAAAAAGUA', 'GAAAAAGUAU', 'GAAAAAGUAUG', 'GAAAAAGUAUGC', 'GAAAAAGUAUGCA', 'GAAAAAGUAUGCAA', 'GAAAAAGUAUGCAAG', 'GAAAAAGUAUGCAAGA', 'GAAAAAGUAUGCAAGAA', 'GAAAAAGUAUGCAAGAAC'] '2' : ['GAG', 'GAGA', 'GAGAG', 'GAGAGA', 'GAGAGAG', 'GAGAGAGA', 'GAGAGAGAC', 'GAGAGAGACA', 'GAGAGAGACAU', 'GAGAGAGACAUA', 'GAGAGAGACAUAG', 'GAGAGAGACAUAGA', 'GAGAGAGACAUAGAG', 'GAGAGAGACAUAGAGG'] '3' : ['GUC', 'GUCU', 'GUCUU', 'GUCUUU', 'GUCUUUG', 'GUCUUUGU', 'GUCUUUGU"', 'GUCUUUGU"G', 'GUCUUUGU"GU', 'GUCUUUGU"GUA', 'GUCUUUGU"GUAC', 'GUCUUUGU"GUACA', 'GUCUUUGU"GUACAU', 'GUCUUUGU"GUACAUC'] 

I'm trying to make sure that the program can find the shortest substring in the list (for example, GAA in the first) and use it to search for all other entries that are just GAA extensions (lines starting with GAA and just add extra letters) and delete them.

I know that there were a lot of questions on how to remove items from the list, but no one helps me with this problem.

+5
source share
2 answers
 >>> dictionary={ '1': ['GAA', 'GAAA', 'GAAAA', 'GAAAAA', 'GAAAAAG', 'GAAAAAGU', 'GAAAAAGUA', 'GAAAAAGUAU', 'GAAAAAGUAUG', 'GAAAAAGUAUGC', 'GAAAAAGUAUGCA', 'GAAAAAGUAUGCAA', 'GAAAAAGUAUGCAAG', 'GAAAAAGUAUGCAAGA', 'GAAAAAGUAUGCAAGAA', 'GAAAAAGUAUGCAAGAAC', 'RTRSRS','GAG', 'GAGA', 'GAGAG', 'GAGAGA', 'GAGAGAG', 'GAGAGAGA', 'GAGAGAGAC', 'GAGAGAGACA', 'GAGAGAGACAU', 'GAGAGAGACAUA', 'GAGAGAGACAUAG', 'GAGAGAGACAUAGA', 'GAGAGAGACAUAGAG', 'GAGAGAGACAUAGAGG']} >>> new_dict = {} >>> for i in dictionary: l = len(min(dictionary[i], key=len)) m = [x for x in dictionary[i] if len(x)==l] temp = [] temp.extend(m) for k in dictionary[i]: if not any(map(lambda j: k.startswith(j), m)): temp.append(k) new_dict[i] = temp >>> print(new_dict) # {'1': ['GAA', 'GAG', 'RTRSRS']} 
+4
source

Your data is not very good. All other entries begin with the shortest line. Therefore, everything will be deleted. Here's a shorter version with a different entry:

 data = {'1' : ['GAA', 'xxxxxxx', 'GAAA', 'GAAAA', 'GAAAAA'], '2' : ['GAG', 'yyyyyyyy', 'GAGA', 'GAGAG', 'GAGAGA'], '3' : ['GUC', 'zzzzzz', 'GUCU', 'GUCUU', 'GUCUUU']} 

Now:

 res = {} for key, value in data.items(): shortest = min(value, key=len) res[key] = [entry for entry in value if not entry.startswith(shortest) or entry == shortest] >>> res {'1': ['GAA', 'xxxxxxx'], '2': ['GAG', 'yyyyyyyy'], '3': ['GUC', 'zzzzzz']} 

Note. This also preserves the position of the shortest line relative to the rest. Just in case, it matters.

+2
source

Source: https://habr.com/ru/post/1237627/


All Articles