How to remove a list from a list (i.e. a sublist) if any element of this subscription is in another list?

I have a list containing several subscriptions. For instance:

full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]] 

I also have another list called omit. For instance:

 omit = [99, 60, 98] 

I want to remove sublists inside the full-screen list if any item in this sublist is in the skip list. For example, I would like the resulting list to be:

 reduced_list = [[1, 1, 3, 4], [2, 4, 4]] 

because only these sublists do not have an item that is in the drop list.

I assume there is an easy way to do this with a list, but I cannot get it to work. I tried a bunch of things: for example:

 reduced_list = [sublist for sublist in full_list if item for sublist not in omit] 
  • this code leads to an error (invalid snytax) - but I think I am missing more.

Any help would be greatly appreciated!

ps. The above task is simplified. My ultimate goal is to remove sub-lists from a very long list (for example, 500,000 subscriptions) of lines, if any element (line) of these sub-lists is in the "drop" list, contains more than 2000 lines.

+4
source share
3 answers

Use set and all() :

 >>> omit = {99, 60, 98} >>> full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]] >>> [item for item in full_list if all(x not in omit for x in item)] [[1, 1, 3, 4], [2, 4, 4]] 

The main difference between this method and the @alecxe (or @ Óscar López) solution is that it all short circuits and does not create any set or list in memory, while set-intersection returns a new set containing all the elements that are common to the omit set, and its length is checked to determine if any element was common or not. (set-intersection happens internally at speed C, so it’s faster than the usual python loops used in all )

Time comparison:

 >>> import random 

Elements do not intersect:

 >>> omit = set(random.randrange(1, 10**18) for _ in xrange(100000)) >>> full_list = [[random.randrange(10**19, 10**100) for _ in xrange(100)] for _ in xrange(1000)] >>> %timeit [item for item in full_list if not omit & set(item)] 10 loops, best of 3: 43.3 ms per loop >>> %timeit [x for x in full_list if not omit.intersection(x)] 10 loops, best of 3: 28 ms per loop >>> %timeit [item for item in full_list if all(x not in omit for x in item)] 10 loops, best of 3: 65.3 ms per loop 

All elements intersect:

 >>> full_list = [range(10**3) for _ in xrange(1000)] >>> omit = set(xrange(10**3)) >>> %timeit [item for item in full_list if not omit & set(item)] 1 loops, best of 3: 148 ms per loop >>> %timeit [x for x in full_list if not omit.intersection(x)] 1 loops, best of 3: 108 ms per loop >>> %timeit [item for item in full_list if all(x not in omit for x in item)] 100 loops, best of 3: 1.62 ms per loop 

Some elements intersect:

 >>> omit = set(xrange(1000, 10000)) >>> full_list = [range(2000) for _ in xrange(1000)] >>> %timeit [item for item in full_list if not omit & set(item)] 1 loops, best of 3: 282 ms per loop >>> %timeit [x for x in full_list if not omit.intersection(x)] 1 loops, best of 3: 159 ms per loop >>> %timeit [item for item in full_list if all(x not in omit for x in item)] 1 loops, best of 3: 227 ms per loop 
+5
source

Try the following:

 full_list = [[1, 1, 3, 4], [3, 99, 5, 2], [2, 4, 4], [3, 4, 5, 2, 60]] omit = frozenset([99, 60, 98]) reduced_list = [x for x in full_list if not omit.intersection(x)] 

The only change I made to the input is that omit now a set for efficiency reasons, since it will allow us to perform a fast intersection (it is frozen because we will not change it), note that x does not have to to be a multitude. Now the reduced_list variable will contain the expected value:

 reduced_list => [[1, 1, 3, 4], [2, 4, 4]] 
+2
source

Make an omit set, check the intersection at each step of the iteration:

 >>> full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]] >>> omit = [99, 60, 98] >>> omit = set(omit) # or just omit = {99, 60, 98} for python >= 2.7 >>> [item for item in full_list if not omit & set(item)] [[1, 1, 3, 4], [2, 4, 4]] 

FYI, it's better to use frozenset instead of the set suggested by @ Óscar López. With frozenset it works a little faster:

 import timeit def omit_it(full_list, omit): return [item for item in full_list if not omit & set(item)] print timeit.Timer('omit_it([[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]], {99, 60, 98})', 'from __main__ import omit_it').timeit(10000) print timeit.Timer('omit_it([[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]], frozenset([99, 60, 98]))', 'from __main__ import omit_it').timeit(10000) 

prints:

 0.0334849357605 0.0319349765778 
+1
source

Source: https://habr.com/ru/post/1500807/


All Articles