How to convert a 2d numpy array to a binary indicator matrix for maximum value

Question

How to convert a 2d numpy array to a binary indicator matrix for maximum value

Assuming I have a 2d numpy array indicating the probabilities for m samples in n classes (probabilities add up to 1 for each sample).

Assuming that each sample can only be in one category, I want to create a new array with the same shape as the original, but only with binary values indicating which class had the highest probability.

Example:

[[0.2, 0.3, 0.5], [0.7, 0.1, 0.1]]

should be converted to:

 [[0, 0, 1], [1, 0, 0]]

It seems that amax is already doing almost what I want, but instead of indexes, I need a matrix of indicators, as described above.

It seems simple, but somehow I can't figure it out using the standard numpy functions. Of course, I could use regular python loops, but it seems like there should be an easier way.

If multiple classes have the same probability, I would prefer a solution that selects only one of the classes (I don't care, in this case).

Thanks!

+5

python python-2.7 numpy machine-learning

aKzenT Mar 22 '16 at 11:50

source share

2 answers

In the case of links (two or more elements are the highest in the row) where you want to select only one, here is one approach to do this with np.argmax and broadcasting -

 (A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)

Run Example -

 In [296]: A Out[296]: array([[ 0.2, 0.3, 0.5], [ 0.5, 0.5, 0. ]]) In [297]: (A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int) Out[297]: array([[0, 0, 1], [1, 0, 0]])

+5

Divakar Mar 22 '16 at 12:14

source share

Warren weckesser · Accepted Answer · 2016-03-22T11:56:21+0000

Here is one way:

 In [112]: a Out[112]: array([[ 0.2, 0.3, 0.5], [ 0.7, 0.1, 0.1]]) In [113]: a == a.max(axis=1, keepdims=True) Out[113]: array([[False, False, True], [ True, False, False]], dtype=bool) In [114]: (a == a.max(axis=1, keepdims=True)).astype(int) Out[114]: array([[0, 0, 1], [1, 0, 0]])

(But this will give a True value for each occurrence of the maximum in the row. See Divakar's answer for a good way to select only the first occurrence of the maximum.)

How to convert a 2d numpy array to a binary indicator matrix for maximum value

More articles: