I have the following pandas framework.
import pandas as pd
df1 = pd.DataFrame(columns=['bar', 'foo'])
df1['bar'] = ['001', '001', '001', '001', '002', '002', '003', '003', '003']
df1['foo'] = [-4, -3, 2, 3, -3, -2, 0, 1, 2]
>>> print df1
bar foo
0 001 -4
1 001 -3
2 001 2
3 001 3
4 002 -3
5 002 -2
6 003 0
7 003 1
8 003 2
Consider the following threshold and parameters.
threshold = 0
n_below = 2
n_above = 2
I would like to create a data frame that filters out certain values bar. barI want to filter out the following: if it has no values n_below fooless thresholdand more n_abovevalues foothan threshold.
In the above example:
- The group
bar = 001will not be filtered out, since bar = 001there are at least n_below = 2records fooless threshold = 0and less n_above = 2records foolonger threshold = 0. - The group
bar = 002will be filtered, because for bar = 002at least more n_above = 2entries foothan threshold = 0. bar = 003 , bar = 003 n_below = 2 foo , threshold = 0.
:
bar foo
0 001 -4
1 001 -3
2 001 2
3 001 3
, GroupBy .count(), , . , , : 1) , n_below; 2), , n_above.