IIUC, you want to compare identical strings. The way to do this is to group all the lines in a raw block:
S,N=12,2 a=np.random.randint(0,3,(S,N))
sample.shape then (S, 1).
Now you can inventory your sample using np.unique and use Pandas dataframes to report:
_,inds,invs=np.unique(samples,return_index=True, return_inverse=True) df=pd.DataFrame(invs) result=df.reset_index().groupby(0).index.apply(list).to_frame() result['sample']=[list(x) for x in a[inds]]
for
index samples 0 0 [3, 9, 11] [0, 0] 1 [4, 6, 7, 8] [0, 1] 2 [5] [1, 1] 3 [2] [1, 2] 4 [1] [2, 0] 5 [0, 10] [2, 2]
It can be O (S ln S), if there is not much between the samples, if your O (N²S).
source share