How to reduce list of tuples in python

I have an array and I want to count the appearance of each element in the array.

I managed to use the map function to create a list of tuples.

def mapper(a): return (a, 1) r = list(map(lambda a: mapper(a), arr)); //output example: //(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1) 

I expect the reduction function can help me group the counts by the first number (id) in each tuple. For instance:

 (11817685, 2), (2014036792, 1), (2014047115, 1) 

I tried

 cnt = reduce(lambda a, b: a + b, r); 

and some other ways, but they all don’t do this trick.

Note Thank you for all the tips on other ways to solve problems, but I'm just learning Python and how to implement map reduction here, and I simplified my real business problem to make it easier to understand, so please kindly show me the correct way to do map-reduce.

+1
source share
4 answers

You can use Counter :

 from collections import Counter arr = [11817685, 2014036792, 2014047115, 11817685] counter = Counter(arr) print zip(counter.keys(), counter.values()) 

EDIT:

As pointed out by @ShadowRanger Counter has an items() method:

 from collections import Counter arr = [11817685, 2014036792, 2014047115, 11817685] print Counter(arr).items() 
+4
source

If you need cnt , then dict will probably be better than list tuple here (if you need this format, just use dict.items ).

The collections module has a useful data structure for this, a defaultdict .

 from collections import defaultdict cnt = defaultdict(int) # create a default dict where the default value is # the result of calling int for key in arr: cnt[key] += 1 # if key is not in cnt, it will put in the default # cnt_list = list(cnt.items()) 
0
source

Instead of using any external module, you can use some logic and do it without any module:

 track={} if intr not in track: track[intr]=1 else: track[intr]+=1 

Code example:

There is a template for these types of list tasks:

So you have a list:

 a=[(2006,1),(2007,4),(2008,9),(2006,5)] 

And you want to convert this to a dict as the first element of the tuple as a key and the second element of the tuple. sort of:

 {2008: [9], 2006: [5], 2007: [4]} 

But there is a trick that you also want those keys that have different meanings, but the keys are the same as (2006,1) and (2006,5), the same, but the meanings are different. you want these values ​​to be added with only one key expected output:

 {2008: [9], 2006: [1, 5], 2007: [4]} 

for this type of task, we do something like this:

first create a new dict, then we follow this pattern:

 if item[0] not in new_dict: new_dict[item[0]]=[item[1]] else: new_dict[item[0]].append(item[1]) 

So, first we check if the key is in the new dict, and if it already adds the value of the duplicate key to its value:

full code:

 a=[(2006,1),(2007,4),(2008,9),(2006,5)] new_dict={} for item in a: if item[0] not in new_dict: new_dict[item[0]]=[item[1]] else: new_dict[item[0]].append(item[1]) print(new_dict) 

output:

 {2008: [9], 2006: [1, 5], 2007: [4]} 
0
source

After writing my answer to another question , I remembered this post and thought it would be useful to write a similar answer here.

Here is a way to use reduce on your list to get the desired result.

 arr = [11817685, 2014036792, 2014047115, 11817685] def mapper(a): return (a, 1) def reducer(x, y): if isinstance(x, dict): ykey, yval = y if ykey not in x: x[ykey] = yval else: x[ykey] += yval return x else: xkey, xval = x ykey, yval = y a = {xkey: xval} if ykey in a: a[ykey] += yval else: a[ykey] = yval return a mapred = reduce(reducer, map(mapper, arr)) print mapred.items() 

What prints:

 [(2014036792, 1), (2014047115, 1), (11817685, 2)] 

See the answer for more details.

0
source

Source: https://habr.com/ru/post/986199/


All Articles