The fastest way to calculate the sum of specific areas of an array

Given the following data (in python 2.7):

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
b = np.array([8,2,3])

I want to get the sum of the first 8 elements in a, then the sum of the elements 9 and 10 and at the end of the last 3 (basic information in b). Desired Result:

[36, 19, 37]

I can do this with loops, etc., but there must be a more poofy and more efficient way to do it!

+4
source share
5 answers

It is easy with np.split:

result = [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
print(result)
>>> [36, 19, 37]
+8
source

Much faster than np.split:

np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]]) 

What does it do:

  • b, , , - c = np.r_[0, np.cumsum(b)[:-1]], array([0, 8, 10]) - 0. b (np.cumsum(b) -> array([8, 10, 13]) ( np.ufunc.reduceat , 13)
  • np.ufunc.reduceat(a, c) reduce a ufunc ( add) , c[i]:c[i+1]. i+1 c, reduce c[i]:-1
  • reduce . , np.add.reduce(a) ( , ) np.sum(a) (, , , a.sum()). , reduceat for @jdehsa numpy c-, .

:

b = np.random.randint(1,10,(10000,))
a = np.random.randint(1,10,(np.sum(b),))

%timeit np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
1000 loops, best of 3: 293 µs per loop
%timeit [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
10 loops, best of 3: 44.6 ms per loop

split a

+6

reduceat np.add ufunc. ( ):

>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
>>> b = np.array([8,2,3])
>>> np.add.reduceat(a, np.append([0], np.cumsum(b)[:-1]))
array([36, 19, 37], dtype=int32)

[:-1] , np.append([0], .

, DanielFs.

append, , :

>>> b_sum = np.zeros_like(b)
>>> np.cumsum(b[:-1], out=b_sum[1:])  # insert the cumsum in the b_sum array directly
>>> np.add.reduceat(a, b_sum)
array([36, 19, 37], dtype=int32)
+2

, b:

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
b = np.array([8,2,3])

c = np.array([np.sum(a[:b[0]]),np.sum(a[b[0]:b[0]+b[1]]),np.sum(a[-b[2]:])])
+1

numba

@Daniel F. , . Python, , . Numba , .

import numba as nb
import numpy as np
import time
def main():
    b = np.random.randint(1,10,(10000,))
    a = np.random.randint(1,10,(np.sum(b),))

    nb_splitsum = nb.njit(nb.int32[:](nb.int32[:], nb.int32[:]),nogil=True)(splitsum)

    t1=time.time()
    for i in xrange(0,1000):
        c=nb_splitsum(a,b)

    print("Numba Solution")
    print(time.time()-t1)

    t1=time.time()
    for i in xrange(0,1000):
        c=np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
    print("Numpy Solution")
    print(time.time()-t1)

def splitsum(a,b):
    sum=np.empty(b.shape[0],dtype=np.int32)
    ii=0
    for i in range(0,b.shape[0]):
        for j in range(0,b[i]):
            sum[i]+=a[ii]
            ii+=1
    return sum

if __name__ == "__main__":
    main()


#Output
Numba Solution
0.125
Numpy Solution
0.280999898911

0,15 . , , , , numpy.

0

Source: https://habr.com/ru/post/1682928/


All Articles