How to remove rows from a numpy array based on several conditions?

I have a file with 4 columns and thousands of rows. I want to delete rows whose positions in the first column are in a certain range. For example, if the data in my file is as follows:

18 6.215 0.025 19 6.203 0.025 20 6.200 0.025 21 6.205 0.025 22 6.201 0.026 23 6.197 0.026 24 6.188 0.024 25 6.187 0.023 26 6.189 0.021 27 6.188 0.020 28 6.192 0.019 29 6.185 0.020 30 6.189 0.019 31 6.191 0.018 32 6.188 0.019 33 6.187 0.019 34 6.194 0.021 35 6.192 0.024 36 6.193 0.024 37 6.187 0.026 38 6.184 0.026 39 6.183 0.027 40 6.189 0.027 

I want to delete rows whose first element is between 20 and 25 or between 30 and 35. This means that the output I expect is the following:

 18 6.215 0.025 19 6.203 0.025 26 6.189 0.021 27 6.188 0.020 28 6.192 0.019 29 6.185 0.020 36 6.193 0.024 37 6.187 0.026 38 6.184 0.026 39 6.183 0.027 40 6.189 0.027 

How can i do this?

+5
source share
4 answers

If you want to continue using numpy , the solution is not complicated.

 data = data[np.logical_not(np.logical_and(data[:,0] > 20, data[:,0] < 25))] data = data[np.logical_not(np.logical_and(data[:,0] > 30, data[:,0] < 35))] 

Or, if you want to combine all this into one operator,

 data = data[ np.logical_not(np.logical_or( np.logical_and(data[:,0] > 20, data[:,0] < 25), np.logical_and(data[:,0] > 30, data[:,0] < 35) )) ] 

To explain, conditional statements like data[:,0] < 25 create logical arrays that track by elements where the condition in the array is true or false. In this case, it tells you where the first data column is less than 25.

You can also index numpy arrays using these logical arrays. An operator like data[data[:,0] > 30] retrieves all rows where data[:,0] > 30 is true, or all rows where the first element is greater than 30. This type of conditional indexing is how you retrieve the rows (or columns or items) that you want.

Finally, we need logical tools to combine logical arrays by elements. The regular expressions and , or and not do not work, because they try to combine logical arrays together as a whole. Fortunately, numpy provides a set of these tools for use in the form of np.logical_and , np.logical_or and np.logical_not . With their help, we can combine our logical arrays by elements to find strings that satisfy more complex conditions.

+7
source

In a special, but frequent case, when the selection criterion is a value falling within an interval, I use abs() difference to the middle of the interval, especially if midInterval has physical meaning:

 data = data[abs(data[:,0] - midInterval) < deviation] # '<' for keeping the interval 

If the data type is an integer and the average is not (as in the June query), you can double the values ​​instead of converting to float (rounding errors become> 1 for huge integers):

 data = data[abs(2*data[:,0] - sumOfLimits) > deltaOfLimits] 

Repeat to delete two intervals. With limitations in Jun question:

 data = data[abs(2*data[:,0] - 45) > 3] data = data[abs(2*data[:,0] - 65) > 3] 
+2
source

Find below my solution to the problem of specific delete strings from a numpy array. The solution is provided as a single line:

 # Remove the rows whose first item is between 20 and 25 A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0) 

and is based on pure numpy functions (np.bitwise_and, np.where, np.delete).

 A = np.array( [ [ 18, 6.215, 0.025 ], [ 19, 6.203, 0.025 ], [ 20, 6.200, 0.025 ], [ 21, 6.205, 0.025 ], [ 22, 6.201, 0.026 ], [ 23, 6.197, 0.026 ], [ 24, 6.188, 0.024 ], [ 25, 6.187, 0.023 ], [ 26, 6.189, 0.021 ], [ 27, 6.188, 0.020 ], [ 28, 6.192, 0.019 ], [ 29, 6.185, 0.020 ], [ 30, 6.189, 0.019 ], [ 31, 6.191, 0.018 ], [ 32, 6.188, 0.019 ], [ 33, 6.187, 0.019 ], [ 34, 6.194, 0.021 ], [ 35, 6.192, 0.024 ], [ 36, 6.193, 0.024 ], [ 37, 6.187, 0.026 ], [ 38, 6.184, 0.026 ], [ 39, 6.183, 0.027 ], [ 40, 6.189, 0.027 ] ] ) # Remove the rows whose first item is between 20 and 25 A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0) # Remove the rows whose first item is between 30 and 35 A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=30), (A[:,0]<=35) ) )[0], 0) >>> A array([[ 1.80000000e+01, 6.21500000e+00, 2.50000000e-02], [ 1.90000000e+01, 6.20300000e+00, 2.50000000e-02], [ 2.60000000e+01, 6.18900000e+00, 2.10000000e-02], [ 2.70000000e+01, 6.18800000e+00, 2.00000000e-02], [ 2.80000000e+01, 6.19200000e+00, 1.90000000e-02], [ 2.90000000e+01, 6.18500000e+00, 2.00000000e-02], [ 3.60000000e+01, 6.19300000e+00, 2.40000000e-02], [ 3.70000000e+01, 6.18700000e+00, 2.60000000e-02], [ 3.80000000e+01, 6.18400000e+00, 2.60000000e-02], [ 3.90000000e+01, 6.18300000e+00, 2.70000000e-02], [ 4.00000000e+01, 6.18900000e+00, 2.70000000e-02]]) 
0
source

You do not need to add complexity with numpy for this. I assume that you read your file in the list of lists here (each line is a list in the general data list as follows: ((18, 6.215, 0.025), (19, 6.203, 0.025), ...)). In this case, use the rule below:

 for row in data: if((row[0] > 20 and row[0] < 25) or (row[0] > 30 and row[0] < 35)): data.remove(row) 
-1
source

Source: https://habr.com/ru/post/1200658/


All Articles