Python count the number of substrings in a list from another list of strings without duplicates

I have two lists:

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

I want to count the number of times when I find the line from master_list in the main_list line, not counting the same element twice.

Example: for these two lists, the result of my function should be 4. "Smith" can be obtained 3 times in the main list. “Roger can be found 2 times, but since Smith has already been found in Roger Smith, this one is no longer taken into account, so Roger just counts 1, which is 4 in total.

The function I wrote for review is below, but I think there is a faster way to do this:

def string_detection(master_list, main_list):
    count = 0
    for substring in master_list:
        temp = list(main_list)
        for string in temp:
            if substring in string:
                main_list.remove(string)
                count+=1
    return count
+4
source share
6

>>>sum(any(m in L for m in master_list) for L in main_list)
4

main_list , any master_list . bool. , , . sum True, .

+8

pandas ( ) str.contains sum()

import pandas as pd
main_list = pd.Series(['Smith', 'Smith', 'Roger', 'Roger-Smith', '42'])
master_list = ['Smith', 'Roger']
count = main_list.str.contains('|'.join(master_list)).sum()
+2

-. , main_list, master_list

temp_list = [ string for string in main_list if any(substring in string for substring in master_list)]

temp_list :

['Smith', 'Smith', 'Roger', 'Roger-Smith']

, temp_list - .

+2

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

print len([word for word in main_list if any(mw in word for mw in master_list)])
+2

:

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

i = 0
for elem in main_list:
    if elem in master_list:
        i += 1
        continue
    for master_elem in master_list:
        if master_elem in elem:
            i += 1
            break

print(i) # i = 4

The code above 'Roger-Smith'is 1, if you want it to count as multiple, delete break.

0
source

If your master_list is not expected to be huge, one way to do this is with regex:

import re

def string_detection(master_list, main_list):
    count = 0
    master = re.compile("|".join(master_list))
    for entry in main_list:
        if master.search(entry):
            count += 1
    return count
0
source

Source: https://habr.com/ru/post/1669915/


All Articles