How numpy indexing works in this scenario

Question

How numpy indexing works in this scenario

How does numpy physical indexing numpy get data from the "data" variable in the code snippet below? I understand that the first parameter is the x coordinate, and the second parameter is the y coordinate. I'm not sure how it maps to data points from a variable.

data = vstack((rand(150,2) + array([.5,.5]),rand(150,2))) # assign each sample to a cluster idx,_ = vq(data,centroids) # some plotting using numpy logical indexing plot(data[idx==0,0],data[idx==0,1],'ob', data[idx==1,0],data[idx==1,1],'or') plot(centroids[:,0],centroids[:,1],'sg',markersize=8)

+4

python numpy matplotlib

DaTaBomB Jul 27 '13 at 23:54

source share

1 answer

unutbu · Accepted Answer · 2013-07-28T00:19:43+0000

All this in the figures:

 In [89]: data.shape Out[89]: (300, 2) # data has 300 rows and 2 columns In [93]: idx.shape Out[93]: (300,) # idx is a 1D-array with 300 elements

idx == 0 is a boolean array with the same form as idx . This is True , where the element in idx is 0 :

 In [97]: (idx==0).shape Out[97]: (300,)

When you index data with idx==0 , you get all rows of data , where idx==0 - True:

 In [98]: data[idx==0].shape Out[98]: (178, 2)

When indexing using the tuple data[idx==0, 0] first data axis is indexed with the boolean array idx==0 , and the second data axis is indexed with 0 :

 In [99]: data[idx==0, 0].shape Out[99]: (178,)

The first axis of data corresponds to rows, the second to columns. This way you only get the first column data[idx==0] . Since the first column of data is x values, this gives those x values in data , where idx==0 .

How numpy indexing works in this scenario

More articles: