How to filter data from a data frame when the number of columns is dynamic?

I have a data frame as shown below

    A_Name  B_Detail  Value_B  Value_C   Value_D ......
0   AA      X1        1.2      0.5       -1.3    ......
1   BB      Y1        0.76     -0.7      0.8     ......
2   CC      Z1        0.7      -1.3      2.5     ......
3   DD      L1        0.9      -0.5      0.4     ......
4   EE      M1        1.3      1.8       -1.3    ......
5   FF      N1        0.7      -0.8      0.9     ......
6   GG      K1        -2.4     -1.9      2.1     ......

This is just a sample data frame, I can have n columns such as (Value_A, Value_B, Value_C, ........... Value_N)

Now I want to filter out all rows where the absolute value of all columns (Value_A, Value_B, Value_C, ....) is less than 1.

If you have a limited number of columns, you can filter the data by simply placing the “and” condition on the columns in the dataframe, but I cannot figure out what to do in this case.

I do not know how many such columns will be, the only thing I know is that such columns will have the prefix "Value".

In the above case, the output should be similar to

    A_Name  B_Detail  Value_B  Value_C   Value_D ......
1   BB      Y1        0.76     -0.7      0.8     ......
3   DD      L1        0.9      -0.5      0.4     ......
5   FF      N1        0.7      -0.8      0.9     ......
+4
4

filter abs all mask, boolean indexing:

mask = (df.filter(like='Value').abs() < 1).all(axis=1)
print (mask)
0    False
1     True
2    False
3     True
4    False
5     True
6    False
dtype: bool

print (df[mask])
  A_Name B_Detail  Value_B  Value_C  Value_D
1     BB       Y1     0.76     -0.7      0.8
3     DD       L1     0.90     -0.5      0.4
5     FF       N1     0.70     -0.8      0.9

:

#len df = 70k, 5 columns
df = pd.concat([df]*10000).reset_index(drop=True)

In [47]: %timeit (df[(df.filter(like='Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.48 ms per loop

In [48]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop

In [49]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop

In [50]: %timeit (df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.3 ms per loop

#len df = 70k, 5k columns
df = pd.concat([df]*10000).reset_index(drop=True)
df = pd.concat([df]*1000, axis=1)
#only for testing, create unique columns names
df.columns = df.columns.str[:-1] + [str(col) for col in list(range(df.shape[1]))]
print (df)

In [75]: %timeit ((df[(df.filter(like='Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop

In [76]: %timeit ((df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop

In [77]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.4 s per loop

In [78]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.1 s per loop
+4
  • filter, .
  • abs().lt(1), 1.
  • all(1), , Value 1.

df[df.filter(regex=r'Value').abs().lt(1).all(1)]

enter image description here


Timing

like , regex lt , <

:

df[df.filter(like='Value').abs().lt(1).all(1)]

enter image description here

enter image description here

+4

( ), ix:

>>> df[df.ix[:,2:].abs().lt(1).all(1)]
      A_Name B_Detail  Value_B  Value_C  Value_D
1     BB       Y1     0.76     -0.7      0.8
3     DD       L1     0.90     -0.5      0.4
5     FF       N1     0.70     -0.8      0.9

, "" "".

, :

In [7]: %timeit df[df.ix[:,2:].abs().lt(1).all(1)]                                                              
1000 loops, best of 3: 803 µs per loop

In [8]: %timeit df[df.filter(like='Value').abs().lt(1).all(1)]
1000 loops, best of 3: 939 µs per loop
0

You can map columns using str.contains , then use apply () with lambda:

cols = df.columns[df.columns.str.contains('Value')]
df[df[cols].apply(lambda x: abs(x) < 1).sum(axis=1) == len(cols)]

Conclusion:

        A_Name  B_Detail    Value_B     Value_C     Value_D
1       BB      Y1          0.76        -0.7        0.8
3       DD      L1          0.90        -0.5        0.4
5       FF      N1          0.70        -0.8        0.9
0
source

Source: https://habr.com/ru/post/1649112/


All Articles