Comparing a string in a numpy array

Question

Comparing a string in a numpy array

I have a 2d numpy bools array and I would like to know how many unique rows my data set contains and the frequency of each row. The only way to solve this problem is to convert the entire data set to a string and then do a comparison, but there certainly should be a better way to do this. Any help is appreciated.

def getUniqueHaplotypes(self,data):
nHap=data.shape[0]
unique=dict() 
for i in range(nHap):
    s = "".join([str(j) for j in data[i]])
    if unique.has_key(s):
        unique[s]+=1
    else:
        unique[s] = 1

return unique

+3

python numpy scipy

Benjamin peter Oct 13 '10 at 1:14

source share

2 answers

, :

def unique_rows(data):
    unique = dict()
    for row in data:
        row = tuple(row)
        if row in unique:
            unique[row] += 1
        else:
            unique[row] = 1
    return unique

. : , , dict()? . Giuseppe

0

Giuseppe 08 . '12 16:03

Joe Kington · Accepted Answer · 2010-10-13T01:28:11+0000

Take a look numpy.uniqueand numpy.bincount.

eg.

import numpy as np
x = (np.random.random(100) * 5).astype(np.int)
unique_vals, indicies = np.unique(x, return_inverse=True)
counts = np.bincount(indicies)

print unique_vals, counts

Edit: Sorry, I misunderstood your question ...

One way to get unique strings is to view objects as a structured array ...

In your case, you have a 2D bools array. So maybe something like this?

import numpy as np
numrows, numcols = 10,3
x = np.random.random((numrows, numcols)) > 0.5
x = x.view(','.join(numcols * ['i1'])) # <- View the rows as a 1D structured array...

unique_vals, indicies = np.unique(x, return_inverse=True)
counts = np.bincount(indicies)

print unique_vals, counts

, , ... ( , ):

def unique_rows(data):
    unique = dict()
    for row in data:
        row = tuple(row)
        if row in unique:
            unique[row] += 1
        else:
            unique[row] = 1
    return unique

defaultdict:

from collections import defaultdict
def unique_rows(data):
    unique = defaultdict(int)
    for row in data:
        unique[tuple(row)] += 1
    return unique

, , "numpy-thonic" ... ( ! , , , , ).

Comparing a string in a numpy array

More articles: