Adding Numpy Arrays, such as Counters

Since collection.Counter is so slow, I am pursuing a faster method of summing displayed values ​​in Python 2.7. This seems like a simple concept, and I'm a little disappointed with the built-in Counter method.

Basically, I need to have arrays like this:

array([[ 0.,  2.],
       [ 2.,  2.],
       [ 3.,  1.]])

array([[ 0.,  3.],
       [ 1.,  1.],
       [ 2.,  5.]])

And then add them so that they look like this:

array([[ 0.,  5.],
       [ 1.,  1.],
       [ 2.,  7.],
       [ 3.,  1.]])

Unless you have a good way to do this quickly and efficiently, I am open to any other ideas that will allow me to do something similar to this, and I am open to modules other than Numpy.

Thank!

Edit: Ready for some speed indicators? Intel won a 64-bit machine. All of the following values ​​are in seconds; 20,000 cycles.

collections.Counter results: 2.131000, 2.125000, 2.125000

Divakar union1d + : 1.641000, 1.633000, 1.625000

Divakar union1d + : 0,625000, 0,625000, 0,641000

: 1,844000, 1,938000, 1,858000

Pandas : 16.659000, 16.686000, 16.885000

: union1d +, Pandas, , , , . , , . , . !

: , Counter1.update(Counter2.elements()) , , (65.671000 ).

Edit: , , Numpy , , , , . , Pandas , Numpy, 0-, (, Numpy , GAE, ). , , , , , - , , - , .

+4
5

np.union1d masking -

def app1(a,b):
    c0 = np.union1d(a[:,0],b[:,0])

    out = np.zeros((len(c0),2))
    out[:,0] = c0

    mask1 = np.in1d(c0,a[:,0])
    out[mask1,1] = a[:,1]

    mask2 = np.in1d(c0,b[:,0])
    out[mask2,1] += b[:,1]
    return out

-

In [174]: a
Out[174]: 
array([[  0.,   2.],
       [ 12.,   2.],
       [ 23.,   1.]])

In [175]: b
Out[175]: 
array([[  0.,   3.],
       [  1.,   1.],
       [ 12.,   5.]])

In [176]: app1(a,b)
Out[176]: 
array([[  0.,   5.],
       [  1.,   1.],
       [ 12.,   7.],
       [ 23.,   1.]])

np.union1d indexing -

def app2(a,b):
    n = np.maximum(a[:,0].max(), b[:,0].max())+1
    c0 = np.union1d(a[:,0],b[:,0])
    out0 = np.zeros((int(n), 2))
    out0[a[:,0].astype(int),1] = a[:,1]

    out0[b[:,0].astype(int),1] += b[:,1]

    out = out0[c0.astype(int)]
    out[:,0] = c0
    return out

, a b -

def app2_specific(a,b):
    c0 = np.union1d(a[:,0],b[:,0])
    n = c0[-1]+1
    out0 = np.zeros((int(n), 2))
    out0[a[:,0].astype(int),1] = a[:,1]        
    out0[b[:,0].astype(int),1] += b[:,1]
    out0[:,0] = c0
    return out0

-

In [234]: a
Out[234]: 
array([[ 0.,  2.],
       [ 2.,  2.],
       [ 3.,  1.]])

In [235]: b
Out[235]: 
array([[ 0.,  3.],
       [ 1.,  1.],
       [ 2.,  5.]])

In [236]: app2_specific(a,b)
Out[236]: 
array([[ 0.,  5.],
       [ 1.,  1.],
       [ 2.,  7.],
       [ 3.,  1.]])
+2

, np.bincount.

c = np.vstack([a, b])
counts = np.bincount(c[:, 0], weights = c[:, 1], minlength = numFields)
out = np.vstack([np.arange(numFields), counts]).T

, . vstack . , np.add.at .

out = np.zeros(2, numFields)
out[:, 0] = np.arange(numFields)
np.add.at(out[:, 1], a[:, 0], a[:, 1])
np.add.at(out[:, 1], b[:, 0], b[:, 1])
+1

. , .

import numpy as np

x = np.array([[ 0.,  2.],
              [ 2.,  2.],
              [ 3.,  1.]])

y = np.array([[ 0.,  3.],
              [ 1.,  1.],
              [ 2.,  5.],
              [ 5.,  3.]])

c, w = np.vstack((x,y)).T
h, b = np.histogram(c, weights=w, 
                    bins=np.arange(c.min(),c.max()+2))
r = np.vstack((b[:-1], h)).T
print(r)
# [[ 0.  5.]
#  [ 1.  1.]
#  [ 2.  7.]
#  [ 3.  1.]
#  [ 4.  0.]
#  [ 5.  3.]]
r_nonzero = r[r[:,1]!=0]
+1

Pandas , ,

import pandas as pd
pda = pd.DataFrame(a).set_index(0)
pdb = pd.DataFrame(b).set_index(0)
result = pd.concat([pda, pdb], axis=1).fillna(0).sum(axis=1)

: numpy,

array_res = result.reset_index(name=1).values
0

, numpy_indexed ( : ) :

import numpy_indexed as npi
C = np.concatenate([A, B], axis=0)
labels, sums = npi.group_by(C[:, 0]).sum(C[:, 1])

: int; , , , . ints.

0

Source: https://habr.com/ru/post/1681150/


All Articles