First, you can work with data.reshape(N,-1), since you are interested in sorting the last two dimensions.
- :
[len(set(i)) for i in data.reshape(data.shape[0],-1)]
, , .
"" , . " " - , "". "" , .
:
np.sort(data.reshape(N,-1))
array([[1, 2, 2, 3, 3, 5, 5, 5, 6, 6],
[1, 1, 1, 2, 2, 2, 3, 3, 5, 7],
[0, 0, 2, 3, 4, 4, 4, 5, 5, 9],
[2, 2, 3, 3, 4, 4, 5, 7, 8, 9],
[0, 2, 2, 2, 2, 5, 5, 5, 7, 9]])
? :
In [530]: data=np.random.randint(10,size=(5,10))
In [531]: [len(set(i)) for i in data.reshape(data.shape[0],-1)]
Out[531]: [7, 6, 6, 8, 6]
In [532]: sdata=np.sort(data,axis=1)
In [533]: (np.diff(sdata)>0).sum(axis=1)+1
Out[533]: array([7, 6, 6, 8, 6])
, np.unique , .
[(np.bincount(i)>0).sum() for i in data]
, , len(set(i)), diff...sort.
[585]: data.shape
Out [585]: (10000, 400)
In [586]: timeit [(np.bincount(i)>0).sum() for i in data]
1 loops, best of 3: 248 ms per loop
In [587]: %%timeit
sdata=np.sort(data,axis=1)
(np.diff(sdata)>0).sum(axis=1)+1
.....:
1 loops, best of 3: 280 ms per loop
bincount, np.count_nonzero
In [715]: timeit np.array([np.count_nonzero(np.bincount(i)) for i in data])
10 loops, best of 3: 59.6 ms per loop
. , count_nonzero (, np.nonzero), . , . ( diff...sort, ).