Python 3 - counting matches in two lists (including duplicates)

First of all, I am new to programming and python, I looked here, but cannot find a solution if this is a stupid question, although please forgive me!

I have two lists, and I'm trying to determine how many times the items in the second list appear in the first list.

I have the following solution:

list1 = ['black','red','yellow'] list2 = ['the','big','black','dog'] list3 = ['the','black','black','dog'] p = set(list1)&set(list2) print(len(p)) 

It works fine if the second list contains duplicates.

i.e. list1 and list2 above returns 1, but also list1 and list3, when ideally this should return 2

Can anyone suggest a solution? Any help would be appreciated!

Thanks,

Adam

+5
source share
4 answers

You see this problem because you are using sets for your collection type. Sets have two characteristics: they are disordered (which is not important here), and their elements are unique. Thus, you lose duplicates in lists when you convert them to sets, even before you find their intersection:

 >>> p = ['1', '2', '3', '3', '3', '3', '3'] >>> set(p) set(['1', '2', '3']) 

There are several ways to do what you want to do here, but you'll want to start by looking at the count method of a list. I would do something like this:

 >>> list1 = ['a', 'b', 'c'] >>> list2 = ['a', 'b', 'c', 'c', 'c'] >>> results = {} >>> for i in list1: results[i] = list2.count(i) >>> results {'a': 1, 'c': 3, 'b': 1} 

With this approach, a dictionary is created ( results ), and for each element in list1 a key is created in results , it is calculated how many times it appears in list2 , and assigned to its key value.

Change: As Lattyware points out, this approach solves a slightly different question than the one you asked. A truly fundamental solution would look like this

 >>> words = ['red', 'blue', 'yellow', 'black'] >>> list1 = ['the', 'black', 'dog'] >>> list2 = ['the', 'blue', 'blue', 'dog'] >>> results1 = 0 >>> results2 = 0 >>> for w in words: results1 += list1.count(w) results2 += list2.count(w) >>> results1 1 >>> results2 2 

This works the same way, to my first sentence: it iterates through each word in the main list (here I use words ), adds the number of times it appears in list1 to the opposite of results1 and list2 to results2 .

If you need more information than just the number of duplicates, you will want to use a dictionary or, even better, a specialized Counter type in collections modules. The counter is built to simplify everything I did in the examples above.

 >>> from collections import Counter >>> results3 = Counter() >>> for w in words: results3[w] = list2.count(w) >>> results3 Counter({'blue': 2, 'black': 0, 'yellow': 0, 'red': 0}) >>> sum(results3.values()) 2 
+5
source

Do not list 1 and list 2 return 0? Or did you mean

 list1 = ['black', 'red', 'yellow'] 

What do you want, I think

 print(len([w for w in list2 if w in list1])) 

The problem with using sets is that the set has no duplicates. In fact, the common reason for using a kit is to eliminate duplicates. Of course, this is what you do not want here.

+3
source

If you want to calculate the frequency of the elements of list1 in list2, perhaps this solution might work for you:

 list1 = ['black', 'red', 'yellow'] list2 = ['the', 'big', 'black', 'dog'] list3 = ['the', 'black', 'black', 'dog'] 

first of all, we can calculate the frequency of elements in list2 and build a dict, and then we can build a subdisk from dict according to list1 , and to get the total frequency you can count the values โ€‹โ€‹of sub_dct:

 # count the frequency of elements of list1 in list2 def cntFrequency(lst1,lst2): dct=dict(Counter(lst2)) sub_dct={k:dct.get(k,0) for k in lst1} return sub_dct 

and the result will look like this:

 from collections import Counter cnt_dct=cntFrequency(list1,list2) print cnt_dct print sum(cnt_dct.values()) # Output {'black': 1, 'yellow': 0, 'red': 0} 1 
0
source

I know this is an old question, but if someone is wondering how to get matches or the length of matches from one or more lists. You can do it too.

 a = [1,2,3] b = [2,3,4] c = [2,4,5] 

To get matches on the two lists, say a and b:

 d = [value for value in a if value in b] # 2,3 

For three lists, there will be

 d = [value for value in a if value in b and value in c] # 2 len(d) # to get the number of matches 

also if you need to handle duplicates. it will be a matter of converting the list to a set in advance, for example

 a = set(a) # and so on 
0
source

Source: https://habr.com/ru/post/1445225/


All Articles