Removing the entire row extension in the list

Question

Removing the entire row extension in the list

I have dictionaries, for example:

'1' : ['GAA', 'GAAA', 'GAAAA', 'GAAAAA', 'GAAAAAG', 'GAAAAAGU', 'GAAAAAGUA', 'GAAAAAGUAU', 'GAAAAAGUAUG', 'GAAAAAGUAUGC', 'GAAAAAGUAUGCA', 'GAAAAAGUAUGCAA', 'GAAAAAGUAUGCAAG', 'GAAAAAGUAUGCAAGA', 'GAAAAAGUAUGCAAGAA', 'GAAAAAGUAUGCAAGAAC'] '2' : ['GAG', 'GAGA', 'GAGAG', 'GAGAGA', 'GAGAGAG', 'GAGAGAGA', 'GAGAGAGAC', 'GAGAGAGACA', 'GAGAGAGACAU', 'GAGAGAGACAUA', 'GAGAGAGACAUAG', 'GAGAGAGACAUAGA', 'GAGAGAGACAUAGAG', 'GAGAGAGACAUAGAGG'] '3' : ['GUC', 'GUCU', 'GUCUU', 'GUCUUU', 'GUCUUUG', 'GUCUUUGU', 'GUCUUUGU"', 'GUCUUUGU"G', 'GUCUUUGU"GU', 'GUCUUUGU"GUA', 'GUCUUUGU"GUAC', 'GUCUUUGU"GUACA', 'GUCUUUGU"GUACAU', 'GUCUUUGU"GUACAUC']

I'm trying to make sure that the program can find the shortest substring in the list (for example, GAA in the first) and use it to search for all other entries that are just GAA extensions (lines starting with GAA and just add extra letters) and delete them.

I know that there were a lot of questions on how to remove items from the list, but no one helps me with this problem.

+5

python dictionary python-3.x

lamazibiji Dec 08 '15 at 5:30

source share

2 answers

Your data is not very good. All other entries begin with the shortest line. Therefore, everything will be deleted. Here's a shorter version with a different entry:

 data = {'1' : ['GAA', 'xxxxxxx', 'GAAA', 'GAAAA', 'GAAAAA'], '2' : ['GAG', 'yyyyyyyy', 'GAGA', 'GAGAG', 'GAGAGA'], '3' : ['GUC', 'zzzzzz', 'GUCU', 'GUCUU', 'GUCUUU']}

Now:

 res = {} for key, value in data.items(): shortest = min(value, key=len) res[key] = [entry for entry in value if not entry.startswith(shortest) or entry == shortest] >>> res {'1': ['GAA', 'xxxxxxx'], '2': ['GAG', 'yyyyyyyy'], '3': ['GUC', 'zzzzzz']}

Note. This also preserves the position of the shortest line relative to the rest. Just in case, it matters.

+2

Mike müller Dec 08 '15 at 6:28

source share

Ayush Shanker · Accepted Answer · 2015-12-08T05:41:12+0000

 >>> dictionary={ '1': ['GAA', 'GAAA', 'GAAAA', 'GAAAAA', 'GAAAAAG', 'GAAAAAGU', 'GAAAAAGUA', 'GAAAAAGUAU', 'GAAAAAGUAUG', 'GAAAAAGUAUGC', 'GAAAAAGUAUGCA', 'GAAAAAGUAUGCAA', 'GAAAAAGUAUGCAAG', 'GAAAAAGUAUGCAAGA', 'GAAAAAGUAUGCAAGAA', 'GAAAAAGUAUGCAAGAAC', 'RTRSRS','GAG', 'GAGA', 'GAGAG', 'GAGAGA', 'GAGAGAG', 'GAGAGAGA', 'GAGAGAGAC', 'GAGAGAGACA', 'GAGAGAGACAU', 'GAGAGAGACAUA', 'GAGAGAGACAUAG', 'GAGAGAGACAUAGA', 'GAGAGAGACAUAGAG', 'GAGAGAGACAUAGAGG']} >>> new_dict = {} >>> for i in dictionary: l = len(min(dictionary[i], key=len)) m = [x for x in dictionary[i] if len(x)==l] temp = [] temp.extend(m) for k in dictionary[i]: if not any(map(lambda j: k.startswith(j), m)): temp.append(k) new_dict[i] = temp >>> print(new_dict) # {'1': ['GAA', 'GAG', 'RTRSRS']}

Removing the entire row extension in the list

More articles: