Use list to filter another list in python

I have a list:

data_list = ['a.1','b.2','c.3'] 

And I want to get only lines that start with lines from another list:

 test_list = ['a.','c.'] 

c.3 and c.3 must be returned.

I suppose I could use a double for loop:

 for data in data_list: for test in test_list: if data.startswith(test): # do something with item 

I was wondering if something was more elegant and perhaps more peformant.

+4
source share
5 answers

str.startswith can also accept a tuple (but not a list) of prefixes:

 test_tuple=tuple(test_list) for data in data_list: if data.startswith(test_tuple): ... 

which means that a simple list comprehension will give you a filtered list:

 matching_strings = [ x for x in data_list if x.startswith(test_tuple) ] 

or filter call:

 import operator f = operator.methodcaller( 'startswith', tuple(test_list) ) matching_strings = filter( f, test_list ) 
+12
source

Just use filter with lambda function and startswith :

 data_list = ['a.1','b.2','c.3'] test_list = ('a.','c.') result = filter(lambda x: x.startswith(test_list), data_list) print(list(result)) 

Output:

 ['a.1', 'c.3'] 
+3
source

Try the following:

 for data in data_list: if any(data.startswith(test) for test in test_list): # do something 

any() is an inline that takes iterability and returns True in the first value from the iterable, that bool true, else returns False . In my example, I use a generator expression instead of creating a list (which would be wasteful).

+2
source

Check out filter and any in python docs.

 >>> data_list = ['a.1','b.2','c.3'] >>> test_list = ['a.','c.'] >>> new_list = filter(lambda x: any(x.startswith(t) for t in test_list), data_list) >>> new_list ['a.1', 'c.3'] 

Then you can do whatever you want with the new_list .

As @Chepner points out, you can also put a tuple of strings on startswith , so you can also write above:

 >>> data_list = ['a.1','b.2','c.3'] >>> test_tuple = ('a.','c.') >>> new_list = filter(lambda x: x.startswith(test_tuple), data_list) >>> new_list ['a.1', 'c.3'] 
+1
source

Alternatively snatch regular expressions

 import re # build a pattern that matches any of the strings we are interested in pattern = re.compile('|'.join(map(re.escape, test_list))) # filter by matches print filter(pattern.match, data_list) 

This is probably the most likely move to C and may be more efficient than other solutions. It can be a little difficult for the uninitiated to follow, though.

+1
source

Source: https://habr.com/ru/post/1485240/


All Articles