Using list comprehension for these labels, which is common to two lists

I have two lists: A and B. I want to create a third list, which is 1 if the corresponding entry in has an entry in list B at the end of the line and 0 otherwise.

A = ['Mary Sue', 'John Doe', 'Alice Stella', 'James May', 'Susie May']
B = ['Smith', 'Stirling', 'Doe']

I want the list to understand what will produce the result

[0, 1, 0, 0, 0]

Keep in mind that this is a specific case of a more general problem. Elements in can have arbitrary empty space and contain an arbitrary number of words in them. Similarly, elements in B can have an arbitrary number of words. for instance

A = ['  Tom Barry Stirling Adam', 'Maddox Smith', 'George Washington Howard Smith']
B = ['Washington Howard Smith', 'Stirling Adam']

must return

[1, 0, 1]

So far I have the following

[1 if y.endswith(x) else 0 for x in B for y in A]

, , 0 1 A [i], B [j]. , , .

+4
5

- :

In [8]: A = ['Mary Sue', 'John Doe', 'Alice Stella', 'James May', 'Susie May']

In [9]: B = ['Smith', 'Stirling', 'Doe']            

In [10]: A *= 1000

In [11]: %%timeit                                                          
t = tuple(B)
[int(s.endswith(t)) for s in A]
   ....: 
100 loops, best of 3: 5.02 ms per loop

In [12]: timeit [int(any(full.endswith(last) for last in B)) for full in A]
100 loops, best of 3: 21.3 ms per loop

A B A , .

, , , :

In [2]: from random import sample

In [6]: A = [s.strip() for s in open("/usr/share/dict/american-english")][:20000]

In [7]: B = sample([s.strip() for s in open("/usr/share/dict/british-english")], 2000)

In [8]: %%timeit                                                                      
t = tuple(B)
[int(s.endswith(t)) for s in A]
   ...: 

1 loop, best of 3: 2.16 s per loop
In [9]: timeit [int(any(full.endswith(last) for last in B)) for full in A]              
1 loop, best of 3: 26.6 s per loop

, , , , log n, :

from  bisect import bisect_left


def compress(l1, l2):
    srt1 = sorted(s[::-1] for s in l2)
    hi = len(l2)
    for ele in l1:
        rev = ele[::-1]
        ind = bisect_left(srt1, rev, hi=hi)
print(list(compress(A, B)))

- O (N log N) , .

+2

B. 0 1 (A, B).

[1 if any(full.endswith(last) for last in B) else 0 for full in A]

bool int

[int(any(full.endswith(last) for last in B)) for full in A]

, set in:

B = {'Smith', 'Stirling', 'Doe'} # set for a more efficient `in`
[int(full.split()[-1] in B) for full in A]
+2
>>> [[0, 1][name.split()[-1] in set(B)] for name in A]
[0, 1, 0, 0, 0]

edit: .

str.split , . :

>>> 'Mary Pat Sue'.split(maxsplit=1)
['Mary', 'Pat Sue']

, - B split:

>>> B = ['Pat', 'Sue']
>>> any(name in 'Pat Sue' for name in B)
True

, :

>>> [[0, 1][any(surname in fullname.split(maxsplit=1)[-1] for surname in B)]
     for fullname in A]
0

[1 if a.split(' ')[1] in B else 0 for a in A]

0

? .

A = ['Mary Sue', 'John Doe', 'Alice Stella', 'James May', 'Susie May']
B = ['Smith', 'Stirling', 'Doe']

rotate B into ".*(?:Smith|Stirling|Doe)$", then compile into regex

import re
end_with_b = re.compile(".*(?:{})$".format("|".join(B))

a_matches = [1 if ends_with_b.match(a) else 0 for a in A]

Or create your own filter function

def my_filter(a):
    return 1 if any(a.endswith(b) for b in B) else 0

a_matches = [my_filter(a) for a in A]
0
source

Source: https://habr.com/ru/post/1654065/


All Articles