Python generator expression to sum dictionary values

A generator expression throws a large number of pairs of tuples, for example. as a list:

pairs = [(3, 47), (6, 47), (9, 47), (6, 27), (11, 27), (23, 27), (41, 27), (4, 67), (9, 67), (11, 67), (33, 67)] 

For each pair in pairs, with key = pair [0] and value = pair [1], I want to pass this stream of pairs to the dictionary to sum the values ​​of the corresponding keys. The obvious solution:

 dict_k_v = {} for pair in pairs: try: dict_k_v[pair[0]] += pair[1] except: dict_k_v[pair[0]] = pair[1] >>> dict_k_v {33: 67, 3: 47, 4: 67, 6: 74, 9: 114, 11: 94, 41: 27, 23: 27} 

However, can this be achieved with a generator expression or some similar construct that does not use a for loop?

EDIT

To clarify, the generator expression throws a large number of pairs of tuples:

(6, 47), (9, 47), (6, 27), (11, 27), (23, 27), (41, 27), (4, 67), (9, 67), (11 , 67), (33, 67) ...

and I want to accumulate every pair of key values ​​in the dictionary (see Paul McGuire's answer) as each pair is generated. The pairs = list [] operator was not superfluous and regretted it. For each pair (x, y), x is an integer, and y can be an integer or decimal / float.

My generator expression is:

 ((x,y) for y in something() for x in somethingelse()) 

and want to copy each pair (x, y) to defaultdict. Hth.

+6
source share
8 answers

For discussion, here is a simple generator function to give us some data:

 from random import randint def generator1(): for i in range(10000): yield (randint(1,10), randint(1,100)) 

And here is the main solution that uses the Python for-loop to use the generator and count the number of counts for each key value pair

 from collections import defaultdict tally = defaultdict(int) for k,v in generator1(): tally[k] += v for k in sorted(tally): print k, tally[k] 

It will output something like:

 1 49030 2 51963 3 51396 4 49292 5 51908 6 49481 7 49645 8 49149 9 48523 10 50722 

But we can create a coroutine that will accept every key-key pair sent to it and accumulate them all in the default address:

 # define coroutine to update defaultdict for every # key,value pair sent to it def tallyAccumulator(t): try: while True: k,v = (yield) t[k] += v except GeneratorExit: pass 

We initialize the coroutine with the defaultdict parameter and will be ready to accept the values ​​by sending it the value None:

 # init coroutine tally = defaultdict(int) c = tallyAccumulator(tally) c.send(None) 

We could use a for loop or list to send all the generator values ​​to the coroutine:

 for val in generator1(): c.send(val) 

or

 [c.send(val) for val in generator1()] 

But instead, we will use deque for zero size to process all values ​​of the generator expression without creating an unnecessary None temporary list:

 # create generator expression consumer from collections import deque do_all = deque(maxlen=0).extend # loop thru generator at C speed, instead of Python for-loop speed do_all(c.send(val) for val in generator1()) 

Now we look at the values ​​again:

 for k in sorted(tally): print k, tally[k] 

And we get another list, similar to the first:

 1 52236 2 49139 3 51848 4 51194 5 51275 6 50012 7 51875 8 46013 9 50955 10 52192 

More information on coroutines on the David Bezley page: http://www.dabeaz.com/coroutines/

+6
source

You can use tuple destructuring and defaultdict to shorten this loop:

 from collections import defaultdict d = defaultdict(int) for k,v in pairs: d[k] += v 

This still uses the for loop, but you don't need to handle the case where the key has not been seen before. I think this is probably the best solution, both in terms of readability and in terms of performance.

Proof of concept using groupby

However, you could do it using itertools.groupby , but it hacked a bit:

 import itertools dict((k, sum(v for k,v in group)) for k, group in itertools.groupby(sorted(pairs), lambda (k,v): k)) 

In addition, this should actually be less effective than the first approach, because for sorting it is necessary to create a list of all pairs for sorting.

+4
source
 >>> dict((x[0], sum(y[1] for y in x[1])) for x in itertools.groupby(sorted(pairs, key=operator.itemgetter(0)), key=operator.itemgetter(0))) {33: 67, 3: 47, 4: 67, 6: 74, 9: 114, 11: 94, 41: 27, 23: 27} 
+3
source

No, you cannot do this without using any form of loop. And using a for loop is really the most reasonable, because you are modifying something in the body of the loop (and not, for example, creating a new iterative or list). However, you can simplify the code using a collections.defaultdict , for example:

 import collections dict_k_v = collections.defaultdict(int) for k, v in pairs: dict_k_v[k] += v 
+1
source

Haskell has a very good general helper for this: Data.Map fromListWith .

fromListWith is similar to Python dict constructors, but also accepts an additional join function to combine duplicate key values. Translating it into Python:

 def dict_fromitems(items, combine): d = dict() for (k, v) in items: if k in d: d[k] = combine(d[k], v) else: d[k] = v return d 

Using this helper, it is easy to express many combinations:

 >>> import operator >>> dict_fromitems(pairs, combine=operator.add) {33: 67, 3: 47, 4: 67, 6: 74, 9: 114, 11: 94, 41: 27, 23: 27} >>> dict_fromitems(pairs, combine=min) {33: 67, 3: 47, 4: 67, 6: 27, 9: 47, 11: 27, 41: 27, 23: 27} >>> dict_fromitems(pairs, combine=max) {33: 67, 3: 47, 4: 67, 6: 47, 9: 67, 11: 67, 41: 27, 23: 27} >>> dict_fromitems(((k, [v]) for (k, v) in pairs), combine=operator.add) {33: [67], 3: [47], 4: [67], 6: [47, 27], 9: [47, 67], 11: [27, 67], 41: [27], 2 3: [27]} 

Note that unlike solutions using defaultdict(int) , this approach is not limited to numerical values, as shown in the above list example. (In general, any monoid is a useful feature: sets with union / intersection, Boolean with and / or, strings with concatenation, etc.)

Adding

As noted in other comments, there is nothing wrong with using a loop for this: this is an appropriate low-level solution. However, it is always good if you can wrap low-level code in multiple higher-level abstractions.

+1
source

You can implement a recursive call, but Python is not optimized for tail recursion, so you will pay a speed penalty and have the potential to eliminate "deep recursion".

 import operator as o def dict_sum(pairs, totals={}): k, v = pairs.pop() o.setitem(sum, k, totals.get(k, 0) + v) if not pairs: return totals else: return dict_sum(pairs, totals) 

I would execute it in a for loop:

 import operator as o totals={} for k, v in pairs: o.setitem(totals, k, totals.get(k, 0) + v) 
0
source

why don't you use a for loop?

 pairs = [(3, 47), (6, 47), (9, 47), (6, 27), (11, 27), (23, 27), (41, 27), (4, 67), (9, 67), (11, 67), (33, 67)] result={} def add(pair): k,v=pair result[k]=result.get(k,0)+v map(add,pairs) print result 
0
source

Sort of:

 dict_k_v = dict(pairs) 
-2
source

Source: https://habr.com/ru/post/908479/


All Articles