Randomly delete rows from dataframe based on condition

given in a data frame with numerical values ​​in a certain column, I want to randomly delete a certain percentage of rows for which the value in this particular column is in a certain range.

For example, for the following data frame:

df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10]}) df col1 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 

2/5 lines, where col1 is less than 6, should be deleted randomly.

What is the most concise way to do this?

+5
source share
1 answer

use sample + drop

 df.drop(df.query('col1 < 6').sample(frac=.4).index) col1 1 2 3 4 4 5 5 6 6 7 7 8 8 9 9 10 

For range

 df.drop(df.query('2 < col1 < 8').sample(frac=.4).index) col1 0 1 1 2 3 4 4 5 5 6 7 8 8 9 9 10 
+6
source

Source: https://habr.com/ru/post/1263509/


All Articles