I think the following standard numpy method for this kind of computation. A call np.uniquemay be missed if the records A[0]are small integers, but make the whole operation more reliable and independent of the actual data.
>>> A = [[1,1,1,2,3,1,2,3],[0.1,0.2,0.2,0.1,0.3,0.2,0.2,0.1]]
>>> unq, unq_idx = np.unique(A[0], return_inverse=True)
>>> unq_sum = np.bincount(unq_idx, weights=A[1])
>>> unq_counts = np.bincount(unq_idx)
>>> unq_avg = unq_sum / unq_counts
>>> unq
array([1, 2, 3])
>>> unq_avg
array([ 0.175, 0.15 , 0.2 ])
Of course, you can compose both arrays, although this will convert unqto float dtype:
>>> np.vstack((unq, unq_avg))
array([[ 1. , 2. , 3. ],
[ 0.175, 0.15 , 0.2 ]])
Jaime source
share