Removing strings in a numpy array

Question

Removing strings in a numpy array

I have an array that might look like this:

ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]

Note that one of the lines has a null value at the end. I want to delete any row containing zero, while preserving any row containing non-zero values in all cells.

But the array will have different numbers of rows each time it is filled, and zeros will be located on different rows every time.

I get the number of non-zero elements in each line with the following line of code:

 NumNonzeroElementsInRows = (ANOVAInputMatrixValuesArray != 0).sum(1)

For the array above, NumNonzeroElementsInRows contains: [5 4]

Five indicate that all possible values in line 0 are nonzero, and four indicate that one of the possible values in line 1 is zero.

Therefore, I am trying to use the following lines of code to find and delete strings containing null values.

 for q in range(len(NumNonzeroElementsInRows)): if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max(): p.delete(ANOVAInputMatrixValuesArray, q, axis=0)

But for some reason, this code does not seem to do anything, although executing a large number of print commands indicates that all the variables seem to be populated correctly, leading up to the code.

There should be an easy way to simply "delete any line containing a null value."

Can someone show me what code to write to execute this?

+44

python numpy delete-row

MedicalMath Oct 06 '10 at

source share

4 answers

Jaidev Deshpande · Answer 1 · 2012-07-26 05:48

The easiest way to remove rows and columns from arrays is with the numpy.delete method.

Suppose I have the following x array:

 x = array([[1,2,3], [4,5,6], [7,8,9]])

To delete the first row, do the following:

 x = numpy.delete(x, (0), axis=0)

To remove the third column, do the following:

 x = numpy.delete(x,(2), axis=1)

So, you can find the indices of the rows that have 0 in them, put them in a list or tuple and pass this as the second argument to the function.

Justin Peel · Answer 2 · 2010-10-07 03:33

Here is one insert (yes, it looks like user333700, but a bit simpler):

 >>> import numpy as np >>> arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]) >>> print arr[arr.all(1)] array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875 , 0.53172222]])

By the way, this method is much, much faster than the masked array method for large matrices. For a 2048 x 5 matrix, this method is about 1000 times faster.

By the way, the user333700 method (from his comment) was a little faster in my tests, although it scares my mind why.

mtrw · Answer 3 · 2010-10-06 23:14

This is similar to your original approach and will use less space than unutbu answer , but I suspect it will be slower.

 >>> import numpy as np >>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]]) >>> p array([[ 1.5, 0. ], [ 1.4, 1.5], [ 1.6, 0. ], [ 1.7, 1.8]]) >>> nz = (p == 0).sum(1) >>> q = p[nz == 0, :] >>> q array([[ 1.4, 1.5], [ 1.7, 1.8]])

By the way, your p.delete() does not work for me - ndarray does not have a .delete attribute.

jeps · Answer 4 · 2011-04-21 12:12

numpy provides a simple function to accomplish the same thing: if you have a masked array of 'a', calling numpy.ma.compress_rows (a) will delete the lines containing the masked value. I think it’s much faster ...

Removing strings in a numpy array

More articles: