Pandas: Filter pivot table rows where the counter is less than the specified value

I have a pandas pivot table that looks something like this:

C bar foo AB one A -1.154627 -0.243234 three A -1.327977 0.243234 B 1.327977 -0.079051 C -0.832506 1.327977 two A 1.327977 -0.128534 B 0.835120 1.327977 C 1.327977 0.838040 

I would like to be able to filter rows where column A has less than 2 rows in column B, so in the table above there will be a filter A = one:

 C bar foo AB three A -1.327977 0.243234 B 1.327977 -0.079051 C -0.832506 1.327977 two A 1.327977 -0.128534 B 0.835120 1.327977 C 1.327977 0.838040 

How can i do this?

+4
source share
2 answers

In one line:

 In [64]: df[df.groupby(level=0).bar.transform(lambda x: len(x) >= 2).astype('bool')] Out[64]: bar foo two A 0.944908 0.701687 B -0.204075 0.713141 C 0.730844 -0.022302 three A 1.263489 -1.382653 B 0.124444 0.907667 C -2.407691 -0.773040 

In the upcoming release of pandas (11.1), the new filter method accomplishes this faster and more intuitively:

 In [65]: df.groupby(level=0).filter(lambda x: len(x['bar']) >= 2) Out[65]: bar foo three A 1.263489 -1.382653 B 0.124444 0.907667 C -2.407691 -0.773040 two A 0.944908 0.701687 B -0.204075 0.713141 C 0.730844 -0.022302 
+7
source

One way is to group “A” and look at groups of size 3:

 In [11]: g = df1.groupby(level='A') In [12]: g.size() Out[12]: A one 1 three 3 two 3 dtype: int64 In [13]: size = g.size() In [13]: big_size = size[size>=3] In [14]: big_size Out[14]: A three 3 two 3 dtype: int64 

Then you can see which lines have “good” A values, and chop them:

 In [15]: good_A = df1.index.get_level_values('A').isin(big_size.index) In [16]: good_A Out[16]: array([False, True, True, True, True, True, True], dtype=bool) In [17]: df1[good_A] Out[17]: bar foo AB three A -1.327977 0.243234 B 1.327977 -0.079051 C -0.832506 1.327977 two A 1.327977 -0.128534 B 0.835120 1.327977 C 1.327977 0.838040 
+2
source

Source: https://habr.com/ru/post/1486251/


All Articles