How to apply Pandas Groupby with several conditions for separation and apply multiple calculations?

I have a data frame that I need to split based on two conditions, and these conditions apply to 3 different columns, and then the use of these new splits is performed according to some calculations on other columns in pairs.

So I have df A

A = pd.DataFrame([[1, 5, 2, 3, 4, 1], [2, 4, 4, 5, 5, 1], [3, 3, 1, 2, 5, 4], [4, 2, 2, 3, 5, 5], [5, 1, 4, 2, 3, 1]], columns=['A', 'B', 'C', 'D', 'E', 'F'], index=[1, 2, 3, 4, 5]) 

I need to split the data frame based on two conditions and apply to the first three columns, for example, this

 condition1_colA = A[A['A'] > 1] condition2_colA = A[A['A'] < 1] condition1_colB = A[A['B'] > 1] condition2_colB = A[A['B'] < 1] condition1_colC = A[A['C'] > 1] condition2_colC = A[A['C'] < 1] 

Now I need to do 3 calculations on each of 6 new cut data frames, however these calculations are soared as follows:

  • condition1_colA and condition2_colA with column ['D']
  • condition1_colB and condition2_colB with column ['E']
  • condition1_colC and condition2_colC with column ['F']

Here is an example for the first pair:

 x = condition1_colA['D'].mean() y = condition1_colA['D'].std() z = condition1_colA['D'][condition1_colA['D'] > 0].count() x1 = condition2_colA['D'].mean() y1 = condition2_colA['D'].std() z1 = condition2_colA['D'][condition1_colA['D'] < 0].count() 

Then we would do the same for the other two couples. And we will organize 18 calculations on a datframe. I know how to do this manually, but I hope for some help with this using a group or perhaps a pivot table.

thanks

+5
source share
1 answer
 import pandas as pd A = pd.DataFrame( [[1, 5, 2, 3, 4, 1], [2, 4, 4, 5, 5, 1], [3, 3, 1, 2, 5, 4], [4, 2, 2, 3, 5, 5], [5, 1, 4, 2, 3, 1]], columns=['A', 'B', 'C', 'D', 'E', 'F'], index=[1, 2, 3, 4, 5]) result = [] pairs = [] for col1, col2 in zip(list('ABC'), list('DEF')): pairs.append(col1+col2) mask = A[col1] > 1 mask2 = mask & (A[col2] > 0) ser = A.loc[mask, col2] result.append((ser.mean(), ser.std(), mask2.sum())) mask = A[col1] < 1 mask2 = mask & (A[col2] < 0) ser = A.loc[mask, col2] result.append((ser.mean(), ser.std(), mask2.sum())) index = pd.MultiIndex.from_product([pairs, [1, 2]], names=['pair', 'cond']) df = pd.DataFrame(result, index=index, columns=['mean', 'sum', 'count']) print(df) 

gives

  mean sum count pair cond AD 1 3.00 1.414214 4 2 NaN NaN 0 BE 1 4.75 0.500000 4 2 NaN NaN 0 CF 1 2.00 2.000000 4 2 NaN NaN 0 
+4
source

Source: https://habr.com/ru/post/1202511/


All Articles