Pandas splits data into several when the condition is true

I have a dataframe like df below. I want to create a new data framework for each piece of data where the condition is true, so this will return df_1, df_2 .... df_n.

| df | | df_1 | | df_2 | | Value | Condition | | Value | | Value | |-------|-----------| |-------|---|-------| | 2 | True | | | 2 | | 0 | | 5 | True | | | 5 | | 5 | | 4 | True | | | 4 | | | | 4 | False | | | | | | | 2 | False | | | | | | | 0 | True | | | | | | | 5 | True | | | | | | | 7 | False | | | | | | | 8 | False | | | | | | | 9 | False | | | | | | 

My only idea is to skip the data frame by returning a start and end index for each piece of True values, and then creating new dataframes with a loop going through the returned indexes, returning something like this for each start / end pair:

 newdf = df.iloc[start:end] 

But doing it seems ineffective.

+5
source share
3 answers

This is an alternative solution. Check out the consecutive_groups recipe from the more_itertools library.

 from itertools import groupby from operator import itemgetter def consecutive_groups(iterable, ordering=lambda x: x): for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])): yield map(itemgetter(1), g) grps = consecutive_groups(df[df.Condition].index) dfs = {i: df.iloc[list(j)] for i, j in enumerate(grps, 1)} # {1: Value Condition # 0 2 True # 1 5 True # 2 4 True, # 2: Value Condition # 5 0 True # 6 5 True} 
+3
source

Create a DataFrame by Series dictionary created by cumsum inverted boolean column and add NaN for where groups:

 g = (~df['Condition']).cumsum().where(df['Condition']) print (g) 0 0.0 1 0.0 2 0.0 3 NaN 4 NaN 5 2.0 6 2.0 7 NaN 8 NaN 9 NaN Name: Condition, dtype: float64 #enumerate for starting groups from 1, 2, N dfs = {i+1:v for i, (k, v) in enumerate(df[['Value']].groupby(g))} print (dfs) {1: Value 0 2 1 5 2 4, 2: Value 5 0 6 5} print (dfs[1]) Value 0 2 1 5 2 4 print (dfs[2]) Value 5 0 6 5 
+4
source

I decided to provide an answer that puts each 'Value' in its own column.

 m = df.Condition.values g = (~m).cumsum() d = df.loc[m, 'Value'] g = g[m] c = d.groupby(g).cumcount() d.set_axis([c, g], inplace=False).unstack() 0 2 0 2.0 0.0 1 5.0 5.0 2 4.0 NaN 
+2
source

Source: https://habr.com/ru/post/1275342/


All Articles