Numpy to binary matrix conversion vector

Question

Numpy to binary matrix conversion vector

I am looking for a clean way to convert a vector of integers to a 2D array of binary values, where they are in the columns corresponding to the values of the vector taken as indices

i.e.

v = np.array([1, 5, 3])
C = np.zeros((v.shape[0], v.max()))

what I'm looking for is a way to convert C to this:

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.]])

I came up with this:

C[np.arange(v.shape[0]), v.T-1] = 1

but I wonder if there is a less detailed / more elegant approach?

thank!

UPDATE

Thanks for your comments! There was an error in my code: if vthere is 0 in it, it puts 1 in the wrong place (last column). Instead, I need to expand the categorical data to include it.

jrennie - , . , - . :

    def permute_array(vector):
        permut = np.zeros((vector.shape[0], vector.max()+1))
        permut[np.arange(vector.shape[0]), vector] = 1
        return permut

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut

    In [193]: vec = np.random.randint(1000, size=1000)
    In [194]: np.all(permute_matrix(vec) == permute_array(vec))
    Out[194]: True

    In [195]: %timeit permute_array(vec)
    100 loops, best of 3: 3.49 ms per loop

    In [196]: %timeit permute_matrix(vec)
    1000 loops, best of 3: 422 µs per loop

:

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut.toarray()

    In [198]: %timeit permute_matrix(vec)
    100 loops, best of 3: 4.1 ms per loop

+4

python numpy

funkifunki 25 . '14 18:43

1

jrennie · Accepted Answer · 2014-04-25T21:20:33+0000

, . , scipy , :

import scipy.sparse
import numpy

indices = [1, 5, 3]
indptr = range(len(indices)+1)
data = numpy.ones(len(indices))
matrix = scipy.sparse.csr_matrix((data, indices, indptr))

Yale Format scipy csr_matrix (, indptr, ) .

, 1 . indices = numpy.array([1, 5, 3])-1, , .

Numpy to binary matrix conversion vector

More articles: