The fastest way to calculate the sum of specific areas of an array

Question

The fastest way to calculate the sum of specific areas of an array

Given the following data (in python 2.7):

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
b = np.array([8,2,3])

I want to get the sum of the first 8 elements in a, then the sum of the elements 9 and 10 and at the end of the last 3 (basic information in b). Desired Result:

[36, 19, 37]

I can do this with loops, etc., but there must be a more poofy and more efficient way to do it!

+4

performance python python-2.7 numpy sum

Diogo santos Aug 3 '17 at 10:25

source share

5 answers

jdehesa · Answer 1 · 2017-08-03T10:32:11+0000

It is easy with np.split:

result = [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
print(result)
>>> [36, 19, 37]

Daniel F · Answer 2 · 2017-08-03T11:03:02+0000

Much faster than np.split:

np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])

What does it do:

b, , , - c = np.r_[0, np.cumsum(b)[:-1]], array([0, 8, 10]) - 0. b (np.cumsum(b) -> array([8, 10, 13]) ( np.ufunc.reduceat , 13)
np.ufunc.reduceat(a, c) reduce a ufunc ( add) , c[i]:c[i+1]. i+1 c, reduce c[i]:-1
reduce . , np.add.reduce(a) ( , ) np.sum(a) (, , , a.sum()). , reduceat for @jdehsa numpy c-, .

:

b = np.random.randint(1,10,(10000,))
a = np.random.randint(1,10,(np.sum(b),))

%timeit np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
1000 loops, best of 3: 293 µs per loop
%timeit [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
10 loops, best of 3: 44.6 ms per loop

split a

MSeifert · Answer 3 · 2017-08-03T11:08:18+0000

reduceat np.add ufunc. ( ):

>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
>>> b = np.array([8,2,3])
>>> np.add.reduceat(a, np.append([0], np.cumsum(b)[:-1]))
array([36, 19, 37], dtype=int32)

[:-1] , np.append([0], .

, DanielFs.

append, , :

>>> b_sum = np.zeros_like(b)
>>> np.cumsum(b[:-1], out=b_sum[1:])  # insert the cumsum in the b_sum array directly
>>> np.add.reduceat(a, b_sum)
array([36, 19, 37], dtype=int32)

ABonnet · Answer 4 · 2017-08-03T10:31:46+0000

, b:

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
b = np.array([8,2,3])

c = np.array([np.sum(a[:b[0]]),np.sum(a[b[0]:b[0]+b[1]]),np.sum(a[-b[2]:])])

max9111 · Answer 5 · 2017-08-03T20:40:50+0000

numba

@Daniel F. , . Python, , . Numba , .

import numba as nb
import numpy as np
import time
def main():
    b = np.random.randint(1,10,(10000,))
    a = np.random.randint(1,10,(np.sum(b),))

    nb_splitsum = nb.njit(nb.int32[:](nb.int32[:], nb.int32[:]),nogil=True)(splitsum)

    t1=time.time()
    for i in xrange(0,1000):
        c=nb_splitsum(a,b)

    print("Numba Solution")
    print(time.time()-t1)

    t1=time.time()
    for i in xrange(0,1000):
        c=np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
    print("Numpy Solution")
    print(time.time()-t1)

def splitsum(a,b):
    sum=np.empty(b.shape[0],dtype=np.int32)
    ii=0
    for i in range(0,b.shape[0]):
        for j in range(0,b[i]):
            sum[i]+=a[ii]
            ii+=1
    return sum

if __name__ == "__main__":
    main()


#Output
Numba Solution
0.125
Numpy Solution
0.280999898911

0,15 . , , , , numpy.

The fastest way to calculate the sum of specific areas of an array

More articles: