Effectively find indices of all values ​​in an array

I have a very large array consisting of integers from 0 to N, where each value happens at least once.

I would like to know, for each value of k, all the indices in my array, where the value of the array is k.

For instance:

arr = np.array([0,1,2,3,2,1,0])
desired_output = {
    0: np.array([0,6]),
    1: np.array([1,5]),
    2: np.array([2,4]),
    3: np.array([3]),
    }

Now I am doing this with a loop through range(N+1)and calling np.whereN times.

indices = {}
for value in range(max(arr)+1):
    indices[value] = np.where(arr == value)[0]

This loop is by far the slowest part of my code. (Both evaluation arr==valueand challenge np.wheretake significant chunks of time.) Is there a more efficient way to do this?

I also tried playing with np.unique(arr, return_index=True), but this only tells me the very first index, not all.

+4
source share
4 answers

collections.defaultdict():

>>> from collections import defaultdict
>>> 
>>> d = defaultdict(list)
>>> 
>>> for i, j in enumerate(arr):
...     d[j].append(i)
... 
>>> d
defaultdict(<type 'list'>, {0: [0, 6], 1: [1, 5], 2: [2, 4], 3: [3]})

Numpythonic numpy.where():

>>> {i: np.where(arr == i)[0] for i in np.unique(arr)}
{0: array([0, 6]), 1: array([1, 5]), 2: array([2, 4]), 3: array([3])}

Numpythonic , :

>>> uniq = np.unique(arr)
>>> args, indices = np.where((np.tile(arr, len(uniq)).reshape(len(uniq), len(arr)) == np.vstack(uniq)))
>>> np.split(indices, np.where(np.diff(args))[0] + 1)
[array([0, 6]), array([1, 5]), array([2, 4]), array([3])]
+2

№ 1

-

sidx = arr.argsort()
unq, cut_idx = np.unique(arr[sidx],return_index=True)
indices = np.split(sidx,cut_idx)[1:]

, , , , -

dict_out = {unq[i]:iterID for i,iterID in enumerate(indices)}

# 2

, -

sidx = arr.argsort()
indices = np.split(sidx,np.flatnonzero(np.diff(arr[sidx])>0)+1)
+7

numpy, defaultdict:

indices = defaultdict(list)
for i, val in enumerate(arr):
    indices[val].append(i)
+1

numpy_indexed:

import numpy_indexed as npi
k, idx = npi.groupy_by(arr, np.arange(len(arr)))

At a higher level; why do you need these indices? Subsequent grouped operations can usually be computed much more efficiently using the group_by functionality [for example, npi.group_by (arr) .mean (someotherarray)], without explicitly calculating key indices.

0
source

Source: https://habr.com/ru/post/1651647/


All Articles